The artificial-intelligence boom unleashed by the launch of ChatGPT has been governed by a single rule - bigger AI models are better. That consensus has pushed Microsoft, Google, Amazon.com, Meta Platforms and others into a spending war to source chips from Nvidia and others.
The competition could be about to change as the industry faces obstacles in its quest to build ever-larger AI models.
Nvidia has been the chief beneficiary of the spending race, since its graphics-processing units - or GPUs - are especially good at carrying out multiple calculations at the same time, significantly reducing the time required to train a model.
The most widely used metric for gauging the capabilities of AI is the number of parameters - a measure of the size and complexity of the model. The general rule is that the more parameters an AI model has, the more GPUs are required to train it efficiently.
But, for the first time, the scaling law is now facing questions. "There is minimal improvement or return beyond one trillion parameters," according to Waseem Alshikh, co-founder and chief technology officer at Writer, a start-up which develops its own AI models.
Microsoft CEO Satya Nadella sounded defensive about the topic as he kicked off the company's Ignite conference a few weeks ago. "It's actually good to have some skepticism, some debate, because that, I think, will motivate, quite frankly, more innovation."
If AI improvement breaks down, AI's current leaders, from Microsoft, Google, and Amazon to OpenAI and Nvidia, could face a fresh set of worries about their big spending. It's no surprise, then, that prominent AI figures are pushing back on the scaling doubts.
"[T]here is no wall," OpenAI CEO Sam Altman recently posted on X. Dario Amodei, CEO of Anthropic, an Amazon and Google-backed AI start-up, said in a podcast that he believes "there's no ceiling below the level of humans."
What's going on then? One explanation is that AI training techniques are hitting certain limits. The most likely reason is a lack of good data to train models with, according to Thomas Wolf, co-founder and chief science officer at Hugging Face, a marketplace for AI models.
"We've already exhausted the internet as a source of training data a few months ago," Wolf told Barron's. "There's only so much high-quality text, code, and images out there."
For Wolf, that points to a future of smaller models, which could be trained on a company or a person's own data and run on individual devices. While AI is currently dominated by large models hosted in the cloud by a small set of major companies, the sector could eventually splinter into lots of specialized models and applications.
That could require new techniques. Meta's Chief AI Officer Yann LeCun has publicly dismissed the idea that simply using more chips to power larger language models will lead to truly intelligent AI, known as artificial general intelligence, or AGI. LeCun has argued that developers will need to focus on developing models with memory, planning, and reasoning capabilities.
"What we've learned in this era of generative AI is that not only is scale important to model innovation, but so are advancements in areas like grounding and reasoning," Microsoft's Eric Boyd, corporate vice president of the Azure AI Platform, told Barron's.
At some point, AI's emphasis will shift from training to inference, the process of generating answers or results from the models. Many in the industry now believe that dedicating more computing power to inference can provide similar benefits to training.
"We are seeing the emergence of a new scaling law...with inference-time compute," Nadella said at Microsoft's Ignite conference.
The inference focus has big implications for Nvidia. While training is uniquely suited to the company's GPUs, inference might be more readily handled by AI processors from Nvidia peers like Advanced Micro Devices and Intel, by custom chips from Amazon, or by a range of chip start-ups.
Nvidia is hardly unaware of the threat. It emphasized in its recent earnings report that inference makes up around 40% of its data-center revenue and is growing fast. It says that its NVL72 server system delivers a fourfold improvement in AI model training but up to a 30 times improvement in inference compared with previous systems. The new NVL72 stitches together 36 GB200 Superchips, with each GB200 connecting two Blackwell GPUs to an Nvidia Grace CPU.
In the short-term, the development of a new type of scaling related to inference is probably good news for Nvidia. In the longer term, though, a scaling shift from training to inference opens AI opportunities for Nvidia's rivals to try to chip away at the company's dominant role.
Thus far, the generative AI race has been met with a rush to stock up on chips to enable the training of ever-larger models. As the race moves to actually using the models, investors need to be ready for a new set of winners - even if they're not yet entirely clear.
Comments