The Paradox of Scaling Law in AI: Does Stronger Reinforcement Learning (RL) Push Us Further from AGI?

Deep News2025-12-24

In the race toward Artificial General Intelligence (AGI), the current emphasis on Reinforcement Learning (RL) may be leading us astray—ironically, the stronger RL becomes, the farther we might be from achieving true AGI.

On December 24, prominent tech blogger and host of the Dwarkesh Podcast, Dwarkesh Patel, released a thought-provoking video challenging the industry's prevailing optimism around Scaling Law and RL. Patel presents a counterintuitive argument: excessive reliance on RL may not be a shortcut to AGI but rather a clear indicator of its distant horizon.

Patel's core argument centers on the contradiction in current AI development. Leading labs are investing heavily in RL to "pre-bake" specific skills—like Excel manipulation or web browsing—into large models by training them on verifiable outcomes. However, Patel argues that this approach inherently conflicts with the essence of AGI. "If we were truly close to creating a human-like learner," he asserts, "this entire method of training on verifiable outcomes would be doomed to fail."

According to Patel, this "pre-baked" skill paradigm exposes a fundamental flaw in current models. Human value in the workplace stems from our ability to learn and adapt without needing specialized training loops for every minor task. A truly intelligent agent should learn autonomously through experience and feedback, not rehearsed scripts. If AI cannot achieve this, its generality remains limited, and AGI remains out of reach.

Patel contends that the real driver of advanced AI is not endless RL but "Continual Learning"—the ability to learn from experience, much like humans. He predicts that solving continual learning won't be a singular breakthrough but a gradual evolution, akin to improvements in "in-context learning" capabilities. This process may take "5 to 10 years to mature," ruling out the possibility of any single model gaining a runaway advantage by cracking the problem first.

Key Takeaways: 1. **The Pre-Baked Skill Paradox**: Current models' reliance on pre-programmed skills (e.g., Excel or browser use) proves they lack human-like general learning abilities, suggesting AGI is not imminent. 2. **Robotics as an Algorithmic Problem**: If human-like learning existed, robotics would already be solved without needing millions of repetitive training sessions. 3. **The "Diffusion Takes Time" Fallacy**: The argument that slow adoption is due to "technology diffusion" is a cope—truly human-like AI would be rapidly integrated due to its lower risk and training costs. 4. **The Income-Ability Gap**: Global knowledge workers generate trillions in value, while AI model revenues remain orders of magnitude lower, indicating models haven’t reached human-replacement thresholds. 5. **Continual Learning as the Bottleneck**: AGI's true hurdle is continual learning, not RL compute power. Achieving AGI may require another 10–20 years.

Patel's insights challenge the AI community to rethink its trajectory, emphasizing that AGI's arrival hinges not on scaling RL but on unlocking autonomous, experience-driven learning.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment