The growth in token expenditure is showing signs of fatigue, indicating a rapid shift in the market's core focus for AI from "technical feasibility" to "cost affordability."
On June 9th, macro strategist Andreas Steno Larsen stated on social media that the Silicon Data LLM Token Expenditure Index is the most critical chart for the entire market to monitor currently.
The index has more than doubled since last December and saw a sharp rise until May 2026, but has recently pulled back. Andreas Steno Larsen warned that if token pricing continues to weaken, the current cycle's trades from memory to broader hardware and data centers could be coming to an end.
Concurrently, major tech companies are urgently trying to curb runaway internal AI compute consumption.
It was previously noted that Amazon and Microsoft are cutting internal AI tools or halting projects that track usage to combat "Tokenmaxxing" behavior, where employees waste compute resources to boost internal rankings. On the service provider side, GitHub Copilot switched its billing model from per-request to per-token on June 1st, causing some users' monthly bills to skyrocket by over tenfold and raising widespread questions about the sustainability of AI subsidy models.
These signals are reshaping investor risk assessments for AI infrastructure trades. Marginal changes in token expenditure, transmitted through the chain of GPU compute, DRAM memory, and data center demand, directly impact capital expenditure expectations for companies like NVIDIA, memory chip manufacturers, and cloud service providers.
Indicator Peaking: Hardware Trade Logic Under Scrutiny
The Silicon Data LLM Token Expenditure Index is an expenditure-weighted metric measuring the price paid per million LLM tokens across the market, serving as a proxy for marginal willingness to pay for AI. Since major providers like OpenAI, Anthropic, and Alphabet often charge customers based on token consumption, token expenditure directly ties AI usage to demand for GPUs, DRAM, and data centers.
The recent stagnation of this index has raised concerns in capital markets about the hardware cycle. Commentary from Silicon Data suggests the recent pullback may signal a slowdown in the migration rate to high-end, closed-source models. If token expenditure remains weak, the marginal revenue funding incremental GPU, DRAM, and data center purchases will diminish, altering the risk profile for companies whose capital expenditure plans are built around token-driven growth.
While a single dip does not constitute a definitive trend, as a leading indicator for the hardware cycle, this data suggests systemic reliance on costly frontier models by enterprises may be facing a decline.
Billing Crisis: Tech Giants Halt "Inefficient Consumption"
The enterprise AI boom is encountering its first genuine billing crisis.
According to an AI consultant cited by Axios, one of their enterprise clients recently spent $5 million on C3.ai, Inc.'s Claude in a single month, simply because no usage cap was set for employees.
Internally, the practice of using AI consumption as a performance metric has backfired. Reports indicate that Amazon's developer platform Kiro had an internal leaderboard called "Kirorank." Similar attempts to inflate token consumption for ranking advantages have also been observed internally at Meta Platforms, Inc..
Amazon Senior Vice President Dave Treadwell acknowledged that employees were running meaningless AI tasks to climb the leaderboard, driving up the company's operational costs. He explicitly instructed staff not to use AI for the sake of using it, and the beta dashboard was subsequently taken down. Amazon has now shifted to a "normalized deployment" metric instead of token consumption to track the actual value of AI-generated code.
Pricing Rebound: The Era of Subsidies Nears Its End
On the supply side, the long-standing AI industry business model of trading subsidies for growth is nearing its limit.
On June 1st, GitHub Copilot officially transitioned to billing based on token usage. A user on the Reddit community stated their monthly cost was projected to surge from under $45 to over $847.
GitHub's Chief Product Officer, Mario Rodriguez, previously stated that with the rise of agentic AI, the old pricing model is unsustainable. Gartner analyst Arun Chandrasekaran noted in an interview with Business Insider that as advanced reasoning models increase compute consumption, more companies will shift to usage-based billing.
Investor Tommy Shaughnessy warned of the systemic risks in this subsidy model. He pointed out that major AI companies currently have deeply negative margins. Once enterprises face the real price of pay-per-use, actual consumption rates could far exceed expectations, citing an example where Uber exhausted its full-year 2026 AI budget in four months. If investors lose confidence in return expectations, the capital flows supporting GPU purchases and model training could face a reversal.
Cost Restructuring: Low-Cost Models May Dominate the Market
Faced with high inference costs, the market is seeking lower-cost alternatives.
Rich Privorotsky, head of Goldman Sachs' One-Delta division, believes that with DeepSeek cutting prices by 75% and Xiaomi's MiMo slashing prices by nearly 99%, easing infrastructure bottlenecks is triggering a price war.
It was previously noted that Coinbase CEO Brian Armstrong predicted that 80% of AI workloads will migrate to models costing 99% less within 12 to 18 months, with only 20% of tasks requiring extreme intelligence remaining on frontier models. He noted that energy and compute will become the real bottlenecks.
Hugging Face CEO Clement Delangue cited Stanford University data confirming this trend: on-device models have seen their accuracy on real-world queries jump to 71.3%, at extremely low cost. Micro1 CEO Ali Ansari views this as a "healthy swing" from overuse to rational use.
Regarding the true return on investment for AI, Wall Street is currently deeply divided. According to Goldman Sachs' Jim Schneider, by 2030, agentic AI will drive a 24-fold increase in token consumption, and cloud provider gross margins will turn positive in the near term. Economic research from JPMorgan also shows the leapfrog growth of Python packages on PyPI proves genuine productivity gains.
However, the bearish camp is equally resolute. Goldman Sachs semiconductor analyst Jim Covello noted in a report that the current industry chain prosperity comes at the expense of upstream consumption, with almost all value flowing to semiconductor companies, a situation he deems unsustainable.
Boosted.ai CEO Josh Pantony emphasized that enterprise concerns over data openness undermine the effectiveness of AI agents. Under the multiple considerations of cost, return, and security, how much real value the next AI bill generates will be the market's final verdict on this wave of tech investment.
Comments