As AI capital expenditure remains high while commercial pressures intensify, the market's focus is undergoing a subtle yet profound shift: can large language models continue to operate "regardless of cost."
According to a recent AI chip research report from Goldman Sachs, the analysis moves beyond the familiar comparisons of "computing power, process technology, and parameter scale" to a more commercially relevant metric—the unit cost during the inference phase. By constructing an "inference cost curve," Goldman Sachs attempts to answer a crucial question for the AI industry: once models enter a phase of high-frequency usage, what is the true cost per million tokens processed under constraints like depreciation, energy consumption, and system utilization for different chip solutions.
The research conclusions point to an accelerating, yet not fully digested, change: the TPU from Alphabet/Broadcom is rapidly closing the gap with NVIDIA's GPUs on inference costs. The upgrade from TPU v6 to TPU v7 has reduced the inference cost per token by approximately 70%, bringing its absolute cost level essentially on par with NVIDIA's GB200 NVL72, and even giving it a slight advantage in some calculation scenarios.
This does not mean NVIDIA's position is being shaken, but it clearly indicates that the core evaluation framework for AI chip competition is shifting from "who computes faster" to "who computes more cheaply and sustainably." As training becomes an upfront investment and inference becomes a long-term source of cash flow, the slope of the cost curve is replacing peak computing power as the key variable determining the industry landscape.
The evaluation criteria for AI chip competition are shifting from computing power leadership to cost efficiency.
In the early stages of AI development, training compute power determined almost everything. Whoever could train larger models faster held the technological advantage. However, as large models gradually enter the deployment and commercialization phase, inference workloads are beginning to far exceed training itself, rapidly magnifying cost concerns.
Goldman Sachs points out that at this stage, a chip's price-performance ratio is no longer determined solely by single-card performance but is shaped by system-level efficiency, including multiple factors such as compute density, interconnect efficiency, memory bandwidth, and energy consumption. The inference cost curve built on this logic shows that the progress made by the Alphabet/Broadcom TPU in raw computational performance and system efficiency is now sufficient to compete directly with NVIDIA on the cost dimension.
In contrast, AMD and Amazon's Trainium still show relatively limited generational cost reductions. Based on current calculations, the unit inference cost for both remains significantly higher than the NVIDIA and Alphabet solutions, posing a limited impact on the mainstream market for now.
The significant cost reduction achieved by TPU v7 does not stem from a single technological breakthrough but rather a concentrated release of system-level optimization capabilities. Goldman Sachs believes that as computing chips themselves gradually approach physical limits, future reductions in inference costs will increasingly rely on advances in "computing-adjacent technologies."
These technologies include: higher bandwidth, lower latency network interconnects; continued integration of high-bandwidth memory (HBM) and storage solutions; advanced packaging technologies (like TSMC's CoWoS); and improvements in density and energy efficiency at the rack-level solution stage. The coordinated optimization of the TPU in these areas gives it a clear economic advantage in inference scenarios.
This trend is highly consistent with Alphabet's own compute deployment strategy. The usage proportion of TPUs within Alphabet's internal workloads continues to rise, and they are widely used for training and inference of the Gemini model. Furthermore, external customers with mature software capabilities are also accelerating their adoption of TPU solutions, with the most notable example being Anthropic's approximately $21 billion order to Broadcom, with related products expected to begin delivery around mid-2026.
However, Goldman Sachs also emphasizes that NVIDIA still holds the "time-to-market" advantage. Just as the TPU v7 catches up to the GB200 NVL72, NVIDIA is already advancing to the GB300 NVL72 and plans to deliver the VR200 NVL144 in the second half of 2026. Its continuous product iteration cadence remains a key factor in maintaining customer stickiness.
From an investment perspective, Goldman Sachs has not downgraded its assessment of NVIDIA due to the TPU's rapid catch-up. The firm maintains its Buy ratings on both NVIDIA and Broadcom, believing they are most directly tied to the most sustainable parts of AI capital expenditure and will benefit long-term from upgrades in networking, packaging, and system-level technologies.
Within the ASIC camp, Broadcom's benefit logic is particularly clear. Goldman Sachs has raised its fiscal year 2026 earnings per share estimate for Broadcom to $10.87, approximately 6% above the market consensus, and believes the market still underestimates its long-term profitability in AI networking and custom computing.
AMD and Amazon's Trainium are still in a catch-up phase, but Goldman Sachs also notes that AMD's rack-level solutions have the potential for late-mover advantages. It is estimated that by late 2026, the Helios rack solution based on the MI455X could achieve approximately a 70% reduction in inference costs in certain training and inference scenarios, warranting continued monitoring.
More importantly, this research report does not present a "winner-takes-all" conclusion but rather a gradually clarifying picture of industry division of labor: GPUs will continue to dominate the training and general-purpose computing markets, while custom ASICs will increasingly penetrate large-scale, predictable inference workloads. In this process, NVIDIA's CUDA ecosystem and system-level R&D investments still constitute a solid moat, but its valuation logic will also continue to be tested against the reality of "declining inference costs."
As AI truly enters a phase where "every token must justify its return," the competition in computing power ultimately returns to economics itself. The TPU's 70% cost plunge is not merely a simple technological catch-up but a crucial stress test for the viability of the AI business model. And this, perhaps, is the signal the market should take most seriously behind the GPU vs. ASIC rivalry.
Comments