Understanding the New Model of Token Economics

Deep News05-18 16:51

The commercialization of AI applications is expanding from selling software and memberships to selling Token call capabilities. Here, Token refers to the smallest unit of information processed by large models and serves as the basis for model API billing, settlement, and consumption. As call volumes increase, Tokens themselves are beginning to be procured, routed, split, and resold like "inventory."

In a recent media industry report, Huayuan Securities analyst Chen Liangdong summarized the core changes as follows: "Token operations are forming a new intermediary market, exploring Token distribution models that connect upstream large model providers with downstream developers, enterprises, and individuals. Essentially, this represents the liquidity infrastructure for the wholesale-to-retail network of global Tokens."

The emergence of this business is not complicated: on one hand, China's Token call volume is rapidly increasing, rising from 100 billion daily calls at the beginning of 2024 to 100 trillion by the end of 2025, and surpassing 140 trillion in March 2026; on the other hand, domestic large models have improved in capability, entering the global first tier in some rankings and call volumes. As demand grows and models proliferate, the real bottlenecks in transactions have become payment, network, interfaces, compliance, channels, and scenario implementation.

However, Token distribution cannot be simply understood as "reselling API quotas." The thinnest layer of profit comes from resale spreads, while the thicker portion comes from inference acceleration, unified interfaces, enterprise-side Prompt engineering, Agent orchestration, model selection, and business system integration. Because the entry barrier is not high, the risks in this market are equally direct: intensified competition, prepayment and bad debts, and policy changes by upstream model providers can all compress the profits of the intermediary layer.

Tokens now have "wholesalers" and "retailers."

The basic chain of Token distribution includes three types of roles.

Upstream are the model providers, including ByteDance's Seedance series, Alibaba's Qwen series, KNOWLEDGE ATLAS's GLM series, Moon Dark Side's Kimi series, DeepSeek series, etc. They are the source suppliers of Tokens.

The middle layer consists of agency platforms responsible for procuring upstream model resources and distributing them to end users. Their work is not just reselling quotas but also converting the interface protocols of different models into a unified API format, allowing downstream users to call multiple models with a single API Key.

Downstream are the actual consumers of Tokens, including individual users, developers, enterprise clients, and potentially sub-distributors.

The value of this intermediary layer is concentrated in several areas: domestic direct connections lower network barriers; one set of code adapts to multiple models; support for personal and corporate payments; bulk procurement may yield lower costs; a single platform aggregates models like GPT, Claude, DeepSeek, and Kimi, reducing the cost for developers to repeatedly integrate.

Thus, Token distribution appears asset-light, requiring neither self-training of large models nor large-scale server clusters. The core assets become the API relay scheduling system, upstream model resources, channel clients, and service capabilities.

The explosive growth in call volume is the most direct fuel for this business.

For Token operation models to succeed, there must first be sufficient consumption volume.

China's daily Token call volume increased from 100 billion to over 140 trillion within two years, a growth of over a thousandfold. The expansion in call volume comes from the implementation of various vertical Agents and enterprises embedding generative AI into more business processes.

IDC data provides an even more aggressive projection: the number of active intelligent agents in Chinese enterprises is expected to exceed 350 million by 2031, with a compound annual growth rate of over 135%. As the task density and complexity of intelligent agents increase, the annual growth rate of Token consumption by intelligent agents is expected to exceed 30 times.

This change is already visible in execution-type intelligent agents. The weekly Token consumption of OpenClaw on the OpenRouter platform increased from 0.81T between February 2 and March 16, 2026, to 4.97T, with its share rising from 8.31% to 24.36%.

Once Tokens become a large-scale consumable, their procurement, pricing, routing, and settlement naturally become layered. Model providers may not directly serve every customer, and end customers may not be willing to integrate with each model individually, creating space for the intermediary layer.

The cost-effectiveness of domestic models opens the door for Token distribution to go global.

The improvement in the capabilities of domestic large models is a key variable enabling Token distribution to expand from domestic to cross-border markets.

SuperCLUE data shows that domestic models like ByteDance's Doubao and the DeepSeek series have achieved comprehensive scores exceeding 70 points, narrowing the gap with overseas leading models like GPT-5.4 and Gemini. Models like Alibaba's Qwen, Kimi, and KNOWLEDGE ATLAS GLM have also formed relatively clear tiers.

According to OpenRouter data, as of the week ending May 10, 2026, Tencent's Hy3 preview (free) ranked first in call volume. Among the top five, ten, and twenty models, domestic large models accounted for 2, 6, and 9, respectively.

A more symbolic change occurred in the first quarter of 2026. From February 9 to 15, the call volume of Chinese models on OpenRouter reached 4.12 trillion Tokens, surpassing the 2.94 trillion Tokens of U.S. models for the first time. From February 16 to 22, the weekly call volume of Chinese models further increased to 5.16 trillion Tokens. Among the top five models by platform call volume, four were from Chinese manufacturers: MiniMax M2.5, Kimi K2.5, KNOWLEDGE ATLAS GLM-5, and DeepSeek V3.2, collectively contributing 85.7% of the total call volume of the top five.

The price advantage is also significant. The input price for MiniMax M2.5 and GLM 5 is $0.3 per million Tokens, while Claude Opus 4.6 is $5. For output, MiniMax M2.5 is $1.1, GLM 5 is $2.55, and Claude Opus 4.6 is $25. In high-Token consumption scenarios like AI Agents and code development, the cost-effectiveness gap between domestic and overseas models will continue to widen.

Global AI resource imbalances make routing platforms "transit stations."

Token distribution not only addresses price issues but also resolves resource mismatches.

Overseas leading large models face restrictions such as regional access limitations, compliance rules, and payment barriers, preventing them from directly reaching certain users, including developers in mainland China. Similarly, high-quality domestic large models expanding overseas encounter challenges in localization adaptation, channel development, and user acquisition.

This imbalance has spurred demand for cross-border circulation, aggregated routing, and layered distribution.

OpenRouter is already a typical example. The platform's Token processing volume increased from 5 trillion to 7 trillion per week in 2025 to over 20 trillion per week in April 2026. Its annualized revenue in 2026 exceeded $50 million, approximately five times the over $10 million annualized revenue disclosed in October 2025.

Similar platforms exist domestically. SiliconFlow is a one-stop large model cloud service platform that provides efficient inference acceleration based on its self-developed inference engine while offering enterprise-level large model services. As of December 2025, the platform had over 9 million registered users, more than 10,000 enterprise users, and over 150 models available.

Even U.S. political capital has entered this field. On May 5, 2026, WLFI, a cryptocurrency company closely linked to Trump and his family, partnered with WorldClaw to launch WorldRouter, integrating over 300 models including Claude, GPT, and Gemini, settled in USD1, with pricing approximately 30% lower than official public rates.

Real profits may not lie in "resale spreads."

Token distribution has three profit models.

The first is resale spreads. Platforms purchase API quotas in bulk from upstream model providers and resell them to downstream customers at a markup. OpenRouter, which adds a premium of about 5.5% to supplier costs, exemplifies this model.

The second is technological premium. Platforms reduce the cost per Token through self-developed inference acceleration engines, generating profits from computational efficiency differences even when selling prices are close to or lower than official rates. SiliconFlow's SiliconLLM and OneDiff technologies increase language model inference speed by 10 times and image generation efficiency by 3 times, reducing large model API call costs to as low as one-tenth of the industry average.

The third is enterprise value-added services. The cost of deploying AI for enterprises is not limited to Token pricing but also includes Prompt engineering, multi-model selection, business system integration, workflow orchestration, operational scheduling, and employee AI capability building. As basic Token prices decline, these hidden costs become more likely to become paid services.

SiliconFlow's enterprise-level MaaS platform is an example of this direction: providing enterprise users with three layers of capabilities—model training and optimization, deployment inference, and application development support—covering data processing, model fine-tuning, Prompt engineering, and RAG, ultimately delivering standardized APIs to industries like energy, finance, and government.

Marketing, short dramas, gaming, and e-commerce are scenarios more likely to consume Tokens.

For Token distribution to be profitable, it must ultimately land in real scenarios.

Generative AI applications are entering industries like healthcare, transportation, and industrial manufacturing and are beginning to participate in core processes such as enterprise decision support and strategic management. However, many enterprises have weak foundations for intelligent transformation, insufficient data asset accumulation, and limited computing power investment, making it difficult to directly deploy AI capabilities.

In contrast, marketing and advertising companies already have clients and scenarios in areas like short dramas, webtoons, gaming, and e-commerce, where Token consumption demand is more direct and sustained. For these companies, the opportunity is not just reselling model capabilities but embedding Tokens into client processes such as content generation, ad placement, material production, and video creation.

Investment opportunities follow two main themes:

One category includes companies with strong model capabilities, such as Alibaba, Tencent Holdings, Kuaishou, Kunlun Tech, KNOWLEDGE ATLAS, and MiniMax.

The other category includes companies with strong Token consumption scenarios and high-quality client sources, particularly those with overseas client resources and marketing scenarios willing to actively invest in AI marketing and AI video creation, such as Yidian Tianxia and BlueFocus.

Risks are also significant: low barriers, prepayment requirements, and upstream control.

While the Token distribution business model is light, its moat is not inherently deep.

Peer competition is the first layer of risk. Distribution business technology barriers are low, and once leading agents enter the market with advantages in capital, clients, and channels, they can quickly replicate the model, compressing profit margins.

Prepayment and bad debts are the second layer of risk. Distributors often settle with downstream customers monthly or quarterly but must prepay when purchasing API quotas from upstream providers. Larger Token consumption scales increase prepayment pressure, and if clients delay payments, bad debt risks amplify accordingly.

Upstream model provider policy changes are the third layer of risk. Large model providers control API pricing and access rules and may adjust prices or tighten third-party access policies. For the intermediary layer, this is the most difficult factor to control.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment