Computing Power Shortage Emerges as Token Usage Soars, Domestic Chips Accelerate Breakthrough in Inference Segment

Stock News04-16

CITIC SEC has released a research report stating that the explosion of applications like AI Agents and multimodal models has led to a computing power shortage in China. Active adaptation of domestic large models on the inference front is creating accelerated volume growth opportunities for domestic computing power manufacturers. The firm forecasts that shipments of domestic computing power chips will at least double by 2026, providing strong growth momentum for chip design companies, advanced process manufacturing, advanced packaging, advanced memory, and related supply chains. Key recommendations include focusing on: 1) AI chip design firms, 2) wafer foundries and advanced packaging, 3) memory, and 4) other related sectors. CITIC SEC's primary views are as follows.

The surge in Token usage is being driven by widespread adoption of AI Agents and multimodal applications, fueling a burst in demand. Since the beginning of 2026, global computing power demand has grown rapidly with a steep growth curve. In April 2026, the weekly cumulative Token consumption on OpenRouter, the world's largest API aggregation platform, increased approximately 7-8 times compared to a year ago, with domestic large models being the primary driver; they currently hold about a 40% market share on OpenRouter. The popularization of "AI Agent" applications and multimodal AI are two core catalysts accelerating computing power demand at the margin. Firstly, under the "AI Agent" trend, Agents run routinely with persistent loads, rapidly increasing computing power needs. A single task for an AI Agent like OpenClaw consumes 10 to 100 times more Tokens than a ChatBot. Domestic manufacturers are actively deploying their own "domestic agents," further accelerating adoption and increasing corresponding computing support demands. Secondly, the spread of multimodal applications raises consumption per interaction. Since 2026, AI multimodal applications like text-to-image and text-to-video have remained highly popular. Inputting/generating images and recognizing/generating videos typically increase Token consumption per interaction by orders of magnitude compared to pure text dialogue. The rapid rise of domestic multimodal models, such as ByteDance's Seedance, is accelerating the domestic multimodal AI application boom.

A computing power shortage is evident through large model price hikes, usage limits, and sold-out compute leasing capacity. The explosion in Token usage has led to a massive surge in computing power demand, while supply-side constraints limit short-term marginal increases, resulting in a severe computing power shortage both domestically and internationally. Specifically: 1) Model price increases and peak-hour restrictions: For instance, Tencent Cloud raised prices for its Hunyuan series core models by over 430% in March; it increased list prices for AI computing power, container services, and Elastic MapReduce by about 5% in April. Between February and March 2026, domestic large models like Kimi frequently displayed "insufficient computing power during peak hours" messages. Overseas model Claude announced adjustments to user session duration limits during peak hours to alleviate computing pressure. 2) Rising AI chip leasing prices in the B2B compute leasing market: According to SemiAnalysis data, the one-year leasing contract price for an H100 GPU increased from a low of about $1.70/hour/GPU in October 2025 to $2.35/hour/GPU in March 2026, a near 40% rise. 3) Premium pricing and shortages for flagship consumer gaming GPUs and systems: This includes NVIDIA's flagship gaming GPU RTX 5090 and Apple's Mac mini M4 systems. 4) Queueing for generation feedback on mainstream AI applications (like ByteDance's Seedance) and shortages/sold-out status for "Coding plan" packages targeting AI programming tool developers. Furthermore, since February, leading domestic cloud and model vendors have explicitly and publicly mentioned tight computing resources, providing direct evidence of the domestic computing power shortage.

Domestic computing power, particularly local inference cards, is seizing accelerated volume growth opportunities. The computing power shortage triggered by the AI application boom will accelerate the volume expansion of domestic cards, with a more direct impact on inference chips. AI chip demand can be divided into training and inference. The current explosion in Agent and multimodal applications is significantly driving demand for inference-side computing power. From the perspective of integrating domestic computing power, inference tasks have lower overall performance requirements compared to training tasks. Domestic computing power chip manufacturers, through deep collaboration with internet companies (customizing/optimizing for specific needs), can provide inference chips better suited to their requirements. The pace of domestic substitution is progressing faster in inference compared to training. Currently, domestic large models are actively adapting to domestic computing cards, especially on the inference front. For example, Minimax, Zhipu, and DeepSeek have all announced adaptation and cooperation with domestic chip makers like Huawei's Ascend, Moore Thread, MetaX, Hygon, and Cambricon. While the performance of individual domestic chips is catching up, it still lags behind NVIDIA by 1-2 generations. The computing power shortage is expected to create a burst growth opportunity for domestic inference chips and other AI chips. It is estimated that the domesticization rate in China's AI chip market is currently around 30-40% and is projected to reach 60-70% by 2030.

Key risk factors include: AI demand falling short of expectations; slower-than-expected development of domestic large models; slower-than-expected progress in domestic inference chips; underperformance of the domestic supply chain; risks associated with technological changes and product iterations; and macroeconomic volatility and geopolitical risks.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment