AI Pricing Shifts and Soaring Compute Costs: Coinbase CEO Predicts Cheap Models to Handle 80% of Workloads Within 18 Months

Deep News06-09 11:08

The shift in GitHub Copilot's pricing is sending shockwaves through the AI industry, sparking a deep debate about the sustainability of its business models. As usage-based billing replaces flat-rate subscriptions, user costs are skyrocketing. Leaders from major tech firms like Coinbase Global, Inc. (COIN) and Hugging Face are charting different paths forward, with the rise of inexpensive models potentially reshaping the fundamental economics of AI compute.

On June 1st, Microsoft's GitHub Copilot officially switched its billing model from per-request charges to a token-based consumption system. This change is expected to send monthly bills for some power users soaring from tens of dollars into the hundreds. The move triggered immediate backlash on social media, with one user sharing an internal cost estimate showing their monthly fee would jump from $44.68 to $754.29, while another projected a bill as high as $847.

This pricing controversy highlights the culmination of the AI industry's long-standing practice of subsidizing growth. Coinbase Global, Inc. CEO Brian Armstrong responded by predicting that 80% of AI workloads will migrate to models that are 99% cheaper within 12 to 18 months, identifying energy and compute power as the true future bottlenecks.

Hugging Face CEO Clement Delangue, meanwhile, cited Stanford University research to provide evidence supporting the large-scale replacement potential of localized, open-source small models.

End of the Subsidized Era

GitHub Copilot's pricing change was not a surprise. In April, GitHub's Chief Product Officer Mario Rodriguez stated publicly that with the rise of agentic AI, the current pricing model was "no longer sustainable." Previously, a brief conversational query and a multi-hour autonomous coding task cost users the same, with GitHub absorbing the ever-increasing inference costs in the background.

Under the new system, effective June 1st, usage costs are converted into AI credits based on the model used and tokens consumed, with each credit valued at $0.01. Subscribers receive a base credit allowance, with additional flexible credits based on their subscription tier. Since cutting-edge AI models typically consume more tokens, the actual cost difference between models can be vast.

User reaction was swift and severe. On GitHub's Reddit community, a user claiming to have subscribed to Copilot Pro+ from day one wrote, "$39/month felt expensive but worth it. With this new AI credit system, I calculated my next bill: $847." Several users compared the change to Uber's business strategy—cultivating user dependence with ultra-low prices before significantly raising them once habits are formed.

An analyst noted that Copilot's case is likely just an early example, anticipating that more companies will shift to token- or usage-based billing as advanced reasoning models and agentic workflows drive a massive increase in inference compute consumption.

Systemic Risks of Subsidy Models

This pricing dispute reveals deeper structural tensions within the AI industry. An investor outlined what he sees as the "most obvious path to an AI breakdown."

He pointed out that flat per-seat subscription fees have long been heavily subsidized, far below the actual cost of heavy usage. Once companies switch to API calls for data protection or compliance reasons, they face the real price of pay-per-use, and consumption often far exceeds prior estimates. He cited cases, including one where a company's annual AI budget for 2026 was exhausted in just four months.

He further argued that major AI firms are operating with deeply negative margins—with one report suggesting a figure near -122%—meaning they rely entirely on external capital to buy GPUs, train models, and continue subsidizing usage. He warned that if investor confidence in returns wanes, the entire flow of capital could reverse.

However, he also noted a boundary condition: if AI genuinely enables new drug discovery or novel commercial forms, user willingness to pay for expensive AI services would rise significantly, potentially alleviating the pressure.

The Rise of Inexpensive Models

In the face of relentlessly rising compute costs, Coinbase Global, Inc. CEO Brian Armstrong offered his analytical framework. He believes the demand for intelligence is nearly infinite, but the market will rapidly bifurcate: 80% of workloads will shift to models that are 99% cheaper within 12-18 months. The remaining 20% of tasks requiring maximum intelligence—such as scientific breakthroughs or high-level agent orchestration—will still run on the latest frontier models.

Armstrong likened this trend to the consumer electronics market, where buyers of top-tier MacBooks or gaming PCs are always a minority, and noted that AI prices are falling even faster than Moore's Law. He concluded that the true future constraints will be energy and compute power, not model capability itself.

Armstrong also shared Coinbase Global, Inc.'s internal practice: the company is actively implementing prompt routing strategies to direct requests to cheaper models, achieving roughly flat total costs in some scenarios even as token usage grows exponentially.

Evidence for a Multi-Model Future

Hugging Face CEO Clement Delangue cited Stanford research to quantify the replacement potential of cheap models: the accuracy of local models on real-world conversational and reasoning queries has surged from 23.2% in 2023 to 71.3%, while costing and consuming only a fraction of frontier API resources.

Based on this, Delangue posited a "multi-model future": for most workloads, localized, open-source, small, and cheap models will be the default choice; frontier APIs will only be called upon when absolutely necessary.

This analysis aligns with other observations. It was noted that one model performs similarly to a top-tier competitor on a programming benchmark but costs about one-thirtieth the price, with the cheapest open-source models priced at roughly one percent. The ongoing open-sourcing of frontier-level models by labs is seen as fundamentally eroding the pricing power and profit margins of closed-source AI giants by giving inference service providers free access to core model technology.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment