Cloud Giants Hike Prices as Token Demand Soars, Driving Up Computing Costs

Deep News04-16 14:26

China's cloud computing sector is entering an era of "computing power inflation." On April 15, Alibaba Cloud announced that starting July 15, the elastic 95th percentile billing fee for its DDoS High Defense service in mainland China will increase by 50%, rising from 100 RMB per Mbps per month to 150 RMB. This marks the third price adjustment by Alibaba Cloud within a single month. This global wave of price hikes was initially triggered overseas—Amazon AWS led by increasing AI computing power prices early this year, followed by Google Cloud, with increases reaching up to 100%. Tencent Cloud and Baidu Intelligent Cloud have also followed suit. Tencent Cloud announced on April 9 that AI computing power, container services, and EMR prices would increase by 5% across the board; Baidu Intelligent Cloud raised prices for AI computing power and storage by 5% to 30%.

The most fundamental driver behind these price increases is the explosive growth in Token consumption. Liu Liehong, Director of the National Data Administration, disclosed at a State Council Information Office press conference on March 24: "By March this year, China's average daily Token calls had exceeded 140 trillion, representing a more than 1,000-fold increase compared to the 100 billion calls at the beginning of 2024, and a further increase of over 40% compared to the 100 trillion calls by the end of 2024, all within just three months." The National Data Administration has officially designated "Token" with the Chinese term "词元" (Ciyuan). It is rapidly transforming from a unit of measurement for AI technology into the "currency" of the intelligent era. The surge in its consumption directly drives up computing power demand, consequently pushing up cloud service prices.

The Token frenzy has also spread to capital markets. Xunce, dubbed the "first Token stock," which listed on the Hong Kong Stock Exchange late last year, saw its market capitalization surpass 100 billion HKD within just 100 days, with its share price soaring 547% year-to-date. Hong Kong-listed cloud giants have also experienced a significant stock price rebound recently. During trading on April 16, Baidu Group surged over 7%, Alibaba rose nearly 5%, and Tencent Holdings gained over 2%.

The dramatic surge in Token consumption is partly driven by factors like OpenClaw. According to data from the OpenRouter platform for the week of March 16 to 22, 2026, nearly a quarter of the platform's Token consumption was contributed by OpenClaw. Separate data from a Guojin Securities computer industry weekly report, covering the week of March 9 to 15, 2026, indicated that OpenClaw contributed to 20% of Token consumption on the OpenRouter platform. The weekly Token consumption by OpenClaw alone was equivalent to 60% of the platform's average weekly Token consumption throughout the fourth quarter of 2025.

The rapid climb in Token consumption highlights a deeper contradiction: increased consumption does not equate to a proportional improvement in intelligence. The operational logic of AI agents fundamentally differs from that of traditional chatbots. Traditional chatbots follow a single-turn interaction model where users ask and the model answers, with Token consumption increasing linearly with the number of dialogue turns. In contrast, AI agents possess closed-loop capabilities of perception, decision-making, and execution. They autonomously break down complex tasks, call external tools, and undergo multiple rounds of iterative verification until the task is complete. This difference in operational logic leads to an exponential magnification of Token consumption.

This issue has already created real commercial conflicts. In early April 2026, Anthropic revoked permission for its subscription users to access the Claude API through third-party tools like OpenClaw. Anthropic officially explained that some heavy users, paying only a $20 monthly subscription fee, were consuming computing resources worth $5,000, creating significant cost pressure for the company. Running an OpenClaw agent for a single day incurs computing costs between $1,000 and $5,000. Affected users were required to switch to a usage-based API payment model. The core conflict between business models and the reality of agent computing consumption has erupted. Token consumption in agent scenarios is unpredictable, with no historical data for reference. Any fixed monthly fee essentially guesses at an unmodelable variable. The root of the problem lies not in the pricing strategy itself, but in the fact that the underlying logic of Token consumption is being fundamentally rewritten by agent technology.

In late March, Tan Dai, President of Volcano Engine, noted in an interview that a significant portion of the Tokens currently being explored are essentially wasted exploration. He cited user feedback that agent products consume Tokens rapidly, emphasizing that the core issue isn't the cost per Token, but the大量的无效尝试 (large number of ineffective attempts) made by agents to complete tasks—over half of the Tokens are consumed during exploration to find the final solution. If the cost per Token is low but the model's capability is insufficient, requiring 10 or even 20 times more Tokens without successfully completing the task, it ultimately results in greater waste. This indicates that the proliferation of agents is consuming computing resources at a pace far exceeding expectations, and there is still room for improvement in existing billing systems and efficiency management mechanisms.

Recently, Luo Fuli, head of Xiaomi Group's MiMo, stated on social media that from a macro perspective, the growth rate of global computing resources can no longer keep up with the surge in Token demand driven by Agents. The real solution is not to provide cheaper Tokens, but to enable the co-evolution of "more efficient Agent frameworks" and "more powerful and efficient models."

Computing power remains persistently tight. Currently, inference is replacing training as the primary battlefield for computing power consumption. A Deloitte report, "2026 TMT Predictions," released in January 2026, stated that AI inference would account for two-thirds of computing power in 2026, primarily occurring in nearly $500 billion worth of new data centers and enterprise servers. As inference demand approaches 70% of the total, the competitive landscape is being rewritten. Cost per Token, deployment density, and energy efficiency are replacing pure peak computing power as key customer selection criteria.

Simultaneously, structural gaps in computing power supply are widening. According to data from semiconductor research firm SemiAnalysis, the price for a one-year H100 GPU leasing contract surged from a low of $1.70 per hour per GPU in October 2025 to $2.35 per hour per GPU in March 2026, an increase of nearly 40%. This index is based on monthly direct surveys of over 100 cloud providers, and computing power buyers and sellers. Despite the price increase, relevant GPU leasing capacity was completely sold out. Finding new GPU computing resources in early 2026 was likened to "booking a seat on the last flight out"—not only expensive but also with almost no availability.

The tight supply isn't limited to GPUs. Reports in late March 2026 indicated that Intel and AMD successively notified customers of processor price increases. Over the past few months, CPUs, as the core for scheduling and inference in AI servers, have been heavily procured by cloud providers. The 2026 server CPU production capacity from these two giants is now basically sold out.

Concurrently, leading internet companies are significantly increasing capital expenditure. Financial reports show Tencent's capital expenditure rapidly increased to 76.8 billion RMB in 2024, a year-on-year rise of 221%, and further grew to 79.2 billion RMB in 2025. Alibaba's capital expenditure grew from 24.4 billion RMB in 2023 to 103.9 billion RMB in 2025, breaking the 100 billion RMB mark. According to public reports, ByteDance's capital expenditure plan for 2026 is approximately 160 billion RMB, with about half directed towards AI chips and data centers.

Another facet of the computing power crunch is the profound transformation occurring in China's AI chip market landscape. According to IDC data, total shipments of AI accelerator cards in the Chinese market reached approximately 4 million units in 2025. NVIDIA shipped about 2.2 million units, capturing a 55% market share; AMD shipped around 160,000 units, holding a 4% share. Combined, Chinese domestic manufacturers shipped approximately 1.65 million units, accounting for about 41% of the market. Among Chinese manufacturers, Huawei leads the pack. IDC data indicates Huawei shipped about 812,000 AI chips in 2025, representing roughly 20% of the total market and close to half of the total shipments from domestic suppliers. Alibaba's T-Head ranked second with about 265,000 units, a 7% market share. Baidu's Kunlun Chip and Cambricon tied for third place, each with approximately 116,000 units. CITIC Securities pointed out that the explosion of applications like Agents and multimodality is driving a spike in Token calls, leading to a domestic computing power shortage. The active adaptation of domestic large models on the inference side presents an acceleration opportunity for domestic computing power manufacturers. They forecast that domestic computing power chip shipments will at least double in 2026, bringing strong growth momentum to chip design companies, advanced process nodes, advanced packaging, advanced memory, and the supporting industry chain. The rise of local manufacturers is gradually changing the supply dynamics of China's AI chip market, offering new possibilities for alleviating the computing power gap.

The final piece of the Token economy puzzle is the business model. As Token consumption scales from trillions to hundreds of trillions, how to price and charge directly impacts whether the industry can achieve a viable commercial closed loop.回顾行业发展 (Looking back at industry development), from the second half of 2024 to early 2025, China's large model market was mired in a price war. ByteDance's Doubao offered prices as low as 0.0008 RMB per thousand Tokens, and Zhipu itself had significantly reduced prices for GLM-4-Plus. However, this landscape is undergoing a fundamental shift in 2026. Zhipu became the first among leading domestic model manufacturers to implement a substantial price increase upon launching a new model. When releasing GLM-5 in February, the price for the CodingPlan package increased by at least 30%. With the release of GLM-5-Turbo in March, prices rose another 20%, resulting in a cumulative increase of 83% compared to GLM-4.7.

The price hikes have not suppressed demand. According to Zhipu's disclosures, while API call pricing increased by 83% in the first quarter of 2026, call volume actually grew by 400%. The Annual Recurring Revenue (ARR) for Zhipu's MaaS API platform is approximately 1.7 billion RMB, having increased 60-fold over the past 12 months. As of March 2026, registered users on the platform exceeded 4 million, covering 218 countries and regions globally. Zhipu CEO Zhang Peng stated that when a model is sufficiently powerful, the API itself is the best business model. Pricing power is determined by technological strength and the leading position afforded by long-term trends.

Currently, Token-based billing is becoming the industry standard. In March, Liu Liehong stated at the China Development Forum annual meeting that a new business logic based on Token (词元) billing is rapidly evolving. A new value system centered around the calling, distribution, and settlement of Tokens is accelerating its formation. However, the real challenge facing Token pricing is not the charging standard itself, but the highly unpredictable nature of Token consumption in agent scenarios. Unlike traditional production factors like electricity or steel, Tokens possess unique programmability. In a signed article in March 2026, NVIDIA founder and CEO Jensen Huang defined the Token as the fundamental unit of modern AI, noting its dual attributes: as language, it is the atom of computation; as currency, it is the medium for value circulation.

Tan Dai suggested that judging the industry's stage can be done by reasoning from the end-state—looking at the total future revenue potential of all Tokens and comparing it to the current actual revenue of the global industry to gauge the current position. With the revenues of companies like OpenAI and Anthropic known, the global industry revenue is roughly three times that of these companies' revenues. Overall, AI computing power and the Token economy are still in a very early stage. There remains significant room for improvement and enhancement in pricing mechanisms, efficiency management, and supply capabilities.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment