The center of value within the AI industry is undergoing a structural shift.
Over the past two years, NVIDIA, memory manufacturers, and energy suppliers have dominated the allocation of AI investment returns. However, as the commercialization of agentic AI accelerates, the profit potential at the model layer is expanding at an unprecedented pace. Yet, NVIDIA and Taiwan Semiconductor Manufacturing Company (TSMC), which control the supply of computing power, have not fully reflected this trend in their pricing.
Anthropic serves as the most direct evidence of this shift. According to the latest research, Anthropic's annualized revenue run rate (ARR) has surged from $9 billion at the beginning of the year to over $44 billion. Concurrently, the gross margin for its inference infrastructure jumped from 38% to over 70%. Meanwhile, the cost to produce tokens has been significantly compressed due to hardware advancements and software optimizations. The widening gap between value and cost is propelling model providers into a new phase of rapidly rising profit margins.
On the supply side, NVIDIA and TSMC possess the most scarce resources but have not yet fully responded with price increases to match the current demand boom. The research suggests this pricing lag constitutes a significant market dislocation. Next-generation systems like Vera Rubin (VR NVL72) possess substantial room for price increases. The entity that seizes the initiative in this value reallocation will profoundly influence the investment rationale across the AI supply chain.
The Three-Year Migration of AI Value Pools
Between 2023 and 2025, the bulk of excess returns from AI investments were concentrated in the infrastructure layer.
In May 2023, NVIDIA released its first blockbuster earnings report, with its stock surging 25% after-hours in a single day, formally igniting the AI investment wave. In 2024, Vistra and GE Vernova soared 265% and 146% respectively, ranking among the top performers in the S&P 500 as the energy bottleneck became a market focus. In 2025, the memory sector took the lead, with companies like SanDisk, Western Digital, Seagate, and Micron all recording full-year gains exceeding 200%, driven by supply-demand imbalances in storage.
Concurrently, model providers and inference service providers faced sustained pressure on their gross margins. At that time, the actual utility of AI was criticized as merely a "better Google search" with a chat interface, a stark contrast to the trillions of dollars in expected capital expenditure.
This landscape underwent a fundamental transformation by the end of 2025.
Agentic AI: The Turning Point Reshaping Token Economics
The research identifies December 2025 as the true inflection point for AI commercialization—when agentic AI began operating stably and was deployed at scale within enterprise workflows. The core significance of this change lies in its fundamental alteration of the economic value of tokens.
Using its own operations as an example, the research firm states its annualized token expenditure is equivalent to roughly 30% of its total employee compensation. Each employee consumes over 5 billion tokens per month, which is more than 5 times the per-employee level internally at Meta. The research team cites several real-world cases: tasks like financial modeling, chart creation, and earnings analysis, which previously required hours of work from junior analysts, can now be completed by agents at a very low token cost. The equivalent human labor cost was previously hundreds to thousands of dollars.
Simultaneously, the production cost of tokens is declining sharply. The research estimates that for agentic task scenarios, the effective blended price for running Opus 4.7 is approximately $0.99 per million tokens, far below the official list price of $5/$25. This is because agent workloads have an exceptionally high input-output ratio (around 300:1) and a cache hit rate exceeding 90%, causing a large volume of tokens to fall into the lowest price tier.
Hardware acceleration is also significant. Compared to the H100 from a year ago, the Blackwell series can generate approximately 30 times more tokens per second on cutting-edge workloads. Further comparisons show that an optimally configured GB300 NVL72 delivers about 17 times the throughput of an optimized H100 setup at FP8 precision. This gap widens to 32 times when switching to FP4, while the total cost of ownership (TCO) is only about 70% higher.
This widening two-way scissors effect between value and cost is the core driver behind Anthropic's gross margin leap from 38% to over 70%.
Model Layer Pricing Power: Why It Won't Be Eroded by Competition
In the face of rapidly expanding profit margins for model providers, the most common market skepticism is that competition will eventually drive prices down. The research disagrees with this view and provides two supporting arguments.
First, pricing power for leading closed-source models remains robust. Although open-source models continue to set new benchmark scores, their performance in real-world knowledge work scenarios remains noticeably weaker than that of leading closed-source models. For instance, the pricing pressure from models like Kimi K2.6 (priced at $0.95/$4) on Anthropic Opus pricing is quite limited.
Second, compute constraints mean no single leading lab can meet the entire market's demand. Anthropic has already begun actively managing demand by locking Claude Code behind a subscription tier above $100 per month and restricting third-party access. Token demand is expected to persistently exceed supply for the foreseeable future. This structural scarcity gives leading model providers the confidence to price based on value, not just cost.
Anthropic has demonstrated this logic through its product line strategy: Opus fast is priced at 6 times the rate of standard Opus. The upcoming Mythos is priced at $25/$125, which is 5 times the rate of standard Opus. Top enterprise clients are still willing to pay for these high-priced SKUs. The research notes that if Anthropic priced Mythos fast at $150/$750, it would itself be a paying customer.
NVIDIA and TSMC: The Pricing Lag of Scarce Resources
However, the two companies controlling the most critical scarce resources—NVIDIA and TSMC—have not fully kept pace with this wave of value reassessment.
TSMC's N3 advanced node capacity has become the tightest bottleneck for overall AI compute expansion. NVIDIA, Broadcom, Annapurna, MediaTek, and AMD are all competing for limited N3 wafer allocations, with N3 capacity utilization expected to exceed 100% in the second half of 2026. DRAM fab utilization is already above 90%, indicating overall tight memory supply, yet pricing remains relatively conservative.
The research argues that TSMC is in a position to raise prices significantly, and customers would not only accept it—some would welcome it. NVIDIA is a prime example: if TSMC raises prices, it means competitors get fewer wafer allocations. NVIDIA paying a higher wafer price could actually help solidify its market position. NVIDIA CEO Jensen Huang publicly stated in 2024 that TSMC should raise wafer prices, and the underlying logic is precisely this.
NVIDIA's own pricing strategy also shows a similar conservative tendency. The research points out that NVIDIA's pricing framework is still anchored to the outdated assumption that "willingness to pay per unit of compute declines over time." This assumption is no longer valid. With the explosion of agentic workloads, compute demand is no longer growing linearly but is accelerating in a compound fashion.
Rubin System: Quantifying NVIDIA's Pricing Potential
Using the upcoming Vera Rubin (VR NVL72) system, expected in the second half of 2026, as a reference, the research constructs a comprehensive pricing analysis framework to anchor the floor and ceiling for rental pricing from both the cost and value perspectives.
Cost Side (Floor): Based on a deployment threshold requiring an Internal Rate of Return (IRR) of no less than 15.6% for a Neocloud provider, the minimum rental rate for VR NVL72 needs to be approximately $4.92 per GPU per hour to maintain deployment interest.
Value Side (Ceiling): Anchored to the current 5-year contract rental rate for GB300 of about $0.70 per PFLOP, the corresponding rental ceiling for VR NVL72 is approximately $12.25 per GPU per hour.
Currently, the pricing for the VR NVL72 system only reduces the cost per PFLOP to about $0.28, a 60% reduction compared to GB300 NVL72, far exceeding historical trend improvements. This implies NVIDIA has approximately 40% room to increase server prices. Even after such an increase, sufficient profit margin would remain for Neocloud providers, and the overall cost improvement would still be less than the historical trend.
SOCAMM memory pricing is another key variable. VR NVL72 uses socketable LPDDR5X memory modules (SOCAMM), which can be priced independently from the compute units. The research estimates the contract price NVIDIA pays for SOCAMM in Q1 2026 is around $8 per GB, a significant jump from the previous quarter. By the end of 2026, SOCAMM prices could exceed $13 per GB. In this context, it would be logically reasonable for NVIDIA to achieve a 60% gross margin on SOCAMM: on one hand, memory supply is constrained and NVIDIA holds the largest share advantage; on the other, VR NVL72's leading performance in TCO leaves customers with few viable alternatives.
Value Destination: Who is Winning, Who is Waiting
The research framework reveals the core tension in current AI value distribution: improvements in token economics are rapidly boosting profits for model providers, inference service providers, and Neoclouds. However, there is a clear misalignment between the pricing behavior of NVIDIA and TSMC—the controllers of the most scarce resources on the compute supply side—and the scarcity of their supply.
The persistence of this misalignment is essentially a deliberate choice. NVIDIA is acting akin to an "AI central bank," delivering value downstream through software efficiency gains to sustain the long-term expansion momentum of the ecosystem while avoiding antitrust regulatory pressure. TSMC continues its historical pricing philosophy of "stabilizing the ecosystem and not capturing all upside gains."
However, as the return on investment (ROI) for inference becomes increasingly clear and value-based pricing logic gains wider market acceptance, the pressure on these two companies to shift towards a value-based pricing framework will continue to rise. Once this shift occurs, the landscape of value distribution across the AI supply chain will be reshaped once again—at which point, pricing power on the compute supply side will, to a greater extent, revert to the hardware layer.
Comments