Is the Era of Giant Chips Making a Comeback?

Stock News01-25 14:34

The year 2025 began with two major announcements in the AI chip sector: Elon Musk confirmed on social media that Tesla (TSLA.US) has restarted its Dojo 3 supercomputer project, stating that Tesla will become the world's largest AI chip manufacturer; meanwhile, Cerebras Systems, another key player in the AI chip industry, finalized a multi-year procurement agreement with OpenAI worth over $10 billion, committing to deliver 750 megawatts of computing power, with this capacity to be deployed in phases by 2028. One represents the "resurrection" of a self-developed training chip, and the other a commercial breakthrough for wafer-scale systems—behind these two distinct pieces of news, the "giant chip" technology path, once considered an outlier, is once again in the spotlight.

In the evolutionary history of AI chips, "giant chip" has never been a precise technical term but rather a generalization for two distinctly different design philosophies. One is represented by Cerebras's wafer-scale monolithic integration, and the other by systems like Tesla's Dojo, which occupy a middle ground between a single chip and a GPU cluster—the "wafer-scale system." The former pursues ultimate simplicity, constructing a single processor from an entire 300mm wafer, while the latter takes an intermediate route, integrating multiple pre-tested chips into a single-chip-like system through advanced packaging. The root of this divergence lies in different approaches to solving two major pain points: the "memory wall" and the "interconnect bottleneck."

Under traditional GPU architectures, the separation of processors and memory forces data to constantly travel between HBM and compute cores. According to technical literature, while Nvidia increased computing power by approximately 6 times from the A100 to the H100, memory bandwidth grew by only 1.7 times; this imbalance has shifted the dominant factor in training time from compute capability to memory bandwidth. Multi-GPU systems further amplify this overhead—even with NVLink 6.0 pushing single-GPU bandwidth to 3.6 TB/s, the latency of inter-chip communication remains hundreds of times higher than that of on-chip interconnects.

The Cerebras WSE-3, released in 2024, offers its own answer with 4 trillion transistors, 900,000 AI cores, and 44 GB of on-chip SRAM: it packs computing and storage onto the same piece of silicon, allowing data to be processed without leaving the chip. Its on-chip interconnect bandwidth reaches 214 Pbps, which is 3,715 times that of an Nvidia H100 system, and its memory bandwidth is a staggering 21 PB/s, 880 times that of the H100. This extreme integration density brings extreme performance gains, achieving a generation speed of 1,800 tokens/s on the Llama 3.1 8B model, compared to the H100's mere 242 tokens/s.

However, such extremes also bring extreme engineering challenges. The yield issue for an entire wafer is almost anti-Moore's Law: the larger the area, the exponentially higher the probability of defects. Cerebras's solution is to shrink each AI core to 0.05 square millimeters—just 1% the size of an H100's SM core—and use redundant design and intelligent routing to bypass defective areas. This ant-colony-like fault-tolerant mechanism allows the chip to maintain overall performance even with imperfections, but at the cost of requiring specialized firmware mapping and complex cooling systems; the WSE-3's 23-kilowatt power consumption necessitates custom liquid cooling loops and hybrid coolants.

In contrast, Tesla's Dojo follows an intermediate wafer-scale system path. The D1 chip itself is only 645 square millimeters, but is arranged in a 5x5 array on a carrier wafer, utilizing TSMC's InFO packaging technology to achieve high-density interconnects, enabling 25 chips to work in concert like a single processor. This design avoids the yield risks of a monolithic wafer—each D1 chip can be pre-tested—while also alleviating the interconnect bottlenecks of multi-chip systems to some extent, with an inter-chip latency of just 100 nanoseconds, far lower than the millisecond-level latency of traditional GPU clusters.

In August 2025, Bloomberg reported that Tesla had disbanded its Dojo supercomputing team, an event initially seen as the end of the self-developed training chip path. Yet, just half a year later, Dojo was restarted, and the underlying logic had fundamentally shifted. Musk revealed on social media that the AI5 chip design is progressing well, Tesla will resume work on Dojo 3, which will utilize AI6 or AI7 chips, and its target is no longer training autonomous driving models on Earth; instead, it will focus on "space artificial intelligence computing." This pivot is intriguing.

Originally, Dojo was positioned as a general-purpose training platform对标ing 100,000 H100 GPUs, with Morgan Stanley once valuing its potential incremental value to Tesla at $500 billion. However, the reality was that core team members left one after another, the project was halted at the end of 2024, and Tesla instead procured computing power equivalent to 67,000 H100s to build its Cortex cluster. The reason is not hard to understand: although the D1's paper specifications are powerful, single-chip performance is not the key factor for training chips. Nvidia's moat is built on over a decade of CUDA ecosystem accumulation, locked-in CoWoS advanced packaging capacity, and deep integration with the HBM supply chain. In contrast, even if Tesla's self-developed Dojo 2 solution had been successfully taped out, it would have needed years to catch up on software adaptation, cluster scheduling, and reliability engineering—time during which Nvidia would have iterated through two or three product generations.

Tesla's current choice is to outsource training and focus on self-developed inference chips, which is essentially a recalculation of opportunity cost. Musk stated that it is unreasonable for Tesla to split resources between two completely different AI chip designs, and that the AI5, AI6, and subsequent chips will be excellent for inference, and at least quite good for training. The AI5 chip uses a 3nm process, manufactured by TSMC, and is expected to enter mass production by the end of 2026, with single-chip performance接近ing Nvidia's Hopper level, and a dual-chip configuration approaching the Blackwell architecture.

More crucially, there is a shift in strategic focus. Dojo 3 is no longer a general-purpose training platform对标ing GPU clusters, but is instead aimed at space-based computing deployment; Musk plans to finance this vision through a future SpaceX IPO, utilizing Starship to deploy computing satellites that can operate under continuous sunlight. The brilliance of this positioning lies in the fact that space computing, as an emerging field, has neither Nvidia's ecosystem barriers nor requires a head-on collision with the mature GPU ecosystem, but instead carves out a completely new application scenario. In November 2025, Starcloud, backed by Nvidia, launched an H100 into space for the first time, and three days later Google announced plans to deploy TPUs to space by early 2027—this space computing race has only just begun.

But even with the restart, challenges remain in other areas. Reports indicate that Tesla has awarded the Dojo 3 chip manufacturing contract to Samsung, and the chip packaging business to Intel. This supply chain adjustment reflects both the reality that TSMC's saturated capacity cannot provide proactive support for Dojo 3, and exposes Tesla's weakness in competing for foundry capacity.

If Tesla's Dojo represents a repositioning through trial and error, then Cerebras's $10 billion partnership with OpenAI is a precise strategic move on the eve of the inference explosion. OpenAI has committed to purchasing up to 750 megawatts of computing capacity from Cerebras by 2028, in a deal valued at over $10 billion. The key to this order is OpenAI's willingness to pay a premium for所谓的 "ultra-low latency inference." A Barclays research report predicts that future AI inference computing demand will account for over 70% of total general AI computing power, with inference demand potentially even exceeding training demand, reaching 4.5 times that of the latter. As generative AI applications like ChatGPT shift from "train once, deploy many times" to "continuous inference, real-time interaction," the value of low-latency inference capability increases sharply.

OpenAI's head of infrastructure, Sachin Katti, stated that when AI responds in real time, users do more, stay longer, and run higher-value workloads. Cerebras's unique speed comes from integrating massive amounts of computing, memory, and bandwidth onto a single giant chip, eliminating the bottlenecks that slow down inference in traditional hardware. This architectural advantage translates into staggering performance gaps in practical applications. The Cerebras WSE-3 ran carbon capture simulations 210 times faster than the H100 and achieved a 20x speedup in AI inference. If Cerebras can consistently deliver sub-second responses at scale, it could slash infrastructure costs and open the door to richer, more conversational applications that rely on streaming responses.

However, this commercial breakthrough did not come easily. In the first half of 2024, 87% of Cerebras's revenue came from the UAE's G42, a situation of over-reliance on a single customer that once hindered its IPO plans. In October 2024, Cerebras withdrew its IPO application but continued fundraising; recent reports indicate the company is in talks for a new $1 billion funding round, valuing it at approximately $22 billion. The value of OpenAI's order exceeds Cerebras's current company valuation, effectively making OpenAI Cerebras's largest and arguably only major customer—a close relationship that is both a commercial breakthrough and a potential risk. Insiders believe that if OpenAI's financial situation were stronger, it might follow other tech giants and directly acquire Cerebras, along with its engineering talent and operational infrastructure. The current partnership structure is seen as more a result of financial realities than strategic intent. OpenAI CEO Sam Altman personally invested in Cerebras as early as 2017, and in 2018 Elon Musk reportedly tried to acquire Cerebras for Tesla; these historical entanglements add further微妙ty to the current cooperation.

This investment also contributes, to some extent, to supply chain diversification. In 2025, OpenAI signed agreements with Nvidia, AMD, and Broadcom. In September, Nvidia committed $100 billion to support OpenAI, building at least 10 gigawatts of Nvidia systems, equivalent to 4 to 5 million GPUs. An OpenAI executive stated that computing scale is highly correlated with revenue growth, but the availability of computing power has become one of the most important limiting factors for further growth. In this context, Cerebras offers a differentiated option: specialized systems optimized for low-latency inference. Analyst Neil Shah pointed out that this prompts hyper-scalers to diversify their computing systems, using Nvidia GPUs for general AI workloads, internal AI accelerators for highly optimized tasks, and systems like Cerebras for specialized low-latency workloads. The fragmentation of inference scenarios—from conversational generation to code completion to image rendering—means no single chip architecture can dominate all scenarios; the value of specialized accelerators lies precisely here.

Both Cerebras and Tesla cannot avoid an ultimate question: in an increasingly competitive landscape, how much room is there for the giant chip path to survive? It's important to remember that the AI chip market is already crowded; in June last year, AMD launched the MI350X and MI355X GPUs, with training and inference speeds comparable to or better than the B200, and in January of this year, Nvidia unveiled the Rubin platform at CES—these chip giants have reached a staggering pace of iteration. As the GPU market trends towards one superpower and several strong competitors, the window of opportunity for a third technological path narrows dramatically—why would customers risk betting on immature wafer-scale systems when they can hedge against Nvidia with general-purpose GPU vendors like AMD?

Cerebras's counter-strategy is to engage in彻底的错位竞争. The CS-3 system does not position itself as a training platform but rather as a specialized inference machine, pushing inference latency to the extreme through its memory-compute integrated architecture while simplifying the software stack. The brilliance of this positioning is that the inference market explosion is just beginning, ecosystem lock-in effects are far weaker than on the training side, and the diversity of inference tasks leaves room for specialized architectures to shine. OpenAI's billion-dollar order essentially validates this business logic with real money; when inference costs constitute the bulk of operational expenditure, a 15x performance improvement is enough to reshape the supplier landscape.

Tesla, on the other hand, is betting on advanced packaging. TSMC's wafer-level CoWoS technology, expected in 2027, promises 40 times the computing power of existing systems, silicon area exceeding 40 reticles, and capacity for 60+ HBM chips—a process roadmap seemingly tailor-made for wafer-level integration. When packaging technology allows for the integration of dozens of pre-tested logic chips and multiple HBM stacks on a single substrate, the boundary between traditional "giant chips" and "small chip interconnects" will blur. Tesla's previous D2 chip chose this path: using CoWoS packaging to achieve wafer-level performance while avoiding the yield risks of monolithic wafers; future Dojo 3 iterations may continue exploring this direction.

Giant chips are back in the spotlight, but the definition of "giant" seems to have quietly changed. First, there is the physical size aspect of "giant"—Cerebras's single-chip occupying an entire wafer remains a technological marvel, but its commercial value is confined to specific scenarios. A Cerebras WSE system costs approximately $2-3 million and has so far been deployed at institutions like Argonne National Laboratory, the Mayo Clinic, and the Condor Galaxy facility in partnership with G42. It will not replace GPUs as the general-purpose training platform, but can open new fronts in latency-sensitive areas like inference and scientific computing.

Second, the "giant" aspect in terms of system integration level—whether it's Tesla's wafer-level packaging or Nvidia's GB200 NVL72 cabinet-scale solutions—is becoming mainstream. A SEMI report shows that global fab equipment spending will reach $110 billion in 2025, growing 18% to $130 billion in 2026, with the logic microcomponents segment being a key driver fueled by investments in advanced technologies like 2nm processes and backside power delivery. The evolution of TSMC's CoWoS roadmap, the standardization push for HBM4, and the adoption of the UCIe interconnect protocol are all driving chiplet-based heterogeneous integration towards system-on-a-chip levels.

Finally, there is the "giant" aspect of the business model—this is the real watershed. The partnership between OpenAI and Cerebras is widely seen as another example of leading tech companies absorbing promising AI chip startups, whether through direct acquisition or through exclusive, large-scale commercial partnerships that effectively fold these startups into a dominant ecosystem. SambaNova, Groq, and Cerebras have each adopted different technical approaches and have been viewed for years as niche challengers capable of competing with the industry leader in specific workloads, but as competition intensifies and customer adoption remains limited, many such startups struggle to move beyond the pilot deployment phase with major customers.

The suspension and restart of Tesla's Dojo was essentially an expensive commercial experiment—it validated that full-stack self-developed training chips are not replicable for non-cloud giants, but it also preserved technical reserves for autonomous control on the inference side. The union of Cerebras and OpenAI, meanwhile, is a precise strategic move on the eve of the inference explosion, trading the extreme performance of wafer-scale architecture for pricing power in vertical scenarios. Against the triple backdrop of Moore's Law slowing down, advanced packaging taking over, and AI application scenarios fragmenting, the seemingly niche technological path of wafer-scale integration is redefining the boundaries of "giant" in unexpected ways. The goal is not to replicate Nvidia's success, but to find those value pockets overlooked by general-purpose solutions within the cracks of the AI computing landscape. In this sense, it's not a binary narrative of rise or fall, but a protracted battle about how to survive in the shadow of giants and ultimately carve out new territory.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment