Memory Shortage Forces AI to Recompute from Scratch: Micron Executive Admits Industry Struggles to Meet Demand, Five Global Fabs Not Enough

Deep News05-06 11:51

The demand for memory from artificial intelligence is growing at an explosive pace, even faster than Micron itself anticipated. A recent episode of The Circuit podcast featured an interview with Jeremy Werner, Senior Vice President and General Manager of Micron Technology's Data Center Business Unit. The conversation centered on the structural shifts within the memory and storage industry in the AI era. Werner stated unequivocally that the current boom in the memory sector is fundamentally different from past cyclical fluctuations.

Memory has become a critical strategic asset for overcoming bottlenecks in data center inference and is a core component supporting the training of the world's most advanced models. Werner does not believe this trend will slow down.

**The "Memory Wall" in AI Inference: Insufficient Memory Leads to Recalculation** Werner provided a straightforward explanation for why inference places such unique demands on memory. He explained that training and inference use memory in截然不同的 ways. "Training uses memory to learn and then forget, ultimately outputting a model. But inference uses memory to remember," Werner said. The inference process is divided into two phases: prefill and decode. During the decode phase, the model needs to continuously access previous计算结果—known as the KV Cache—to generate more accurate answers. The problem arises when there isn't enough memory to store these historical states; the model must then recompute from the beginning. Werner elaborated on the implications: each round of recomputation requires computational power equivalent to the sum of all previous rounds, meaning computational demand grows exponentially. In contrast, if the previous state can be stored, each subsequent round only requires a linear increase in computation. In other words, insufficient memory causes a sharp decline in GPU computational efficiency. Conversely, Werner pointed out, "If you can provide memory that is fast enough and large enough, theoretically you can extract a squared multiple of compute performance from the GPU."

Three factors are driving the expansion of KV Cache需求: increasingly long context windows, growing model parameter counts, and a rising number of concurrent AI users. Werner revealed that context length is currently growing at a rate of 30x per year.

**Memory Hierarchy: A Complete "Storage Chain" from HBM to SSD** Werner详细梳理ed the memory hierarchy within AI data centers, describing a complete "storage chain" from High Bandwidth Memory (HBM) closest to the GPU to massive SSDs at the farthest end. * **Layer 1: HBM.** Directly adjacent to the GPU, with typical capacities between 10GB and 100GB. It is the fastest but has limited capacity. * **Layer 2: Main Memory.** Connected to the CPU, with capacity typically 4 to 20 times that of HBM, but slower and farther away. In NVIDIA's Blackwell system, for example, main memory is connected to the Grace CPU. * **Layer 3: Expansion Memory.** Involves connecting independent memory modules via optical fiber. This is not yet deployed at scale but is a direction the industry is watching. * **Layer 4: Context Memory Storage.** This refers to using SSDs to store the KV Cache. Werner noted that NVIDIA CEO Jensen Huang has publicly discussed this direction this year. Compared to HBM, SSDs have higher latency and lower bandwidth but can offer 1,000 times the capacity. * **Layer 5: Data Lake.** The underlying massive SSD storage in data centers, measured in exabytes (EB).

Werner stated that the entire hierarchy, from top to bottom, is in a state of undersupply: "As soon as we release a product, it gets consumed. As soon as we increase capacity and performance, they find a way to deploy it."

**HBM4 and 245TB SSD: Micron's Key Initiatives** In response to this demand, Micron is advancing on two fronts simultaneously. Regarding HBM4, Werner透露 that Micron just released its HBM4 product, which offers more than double the bandwidth of the previous generation HBM3e. He emphasized that the core logic behind increasing bandwidth is that when the bottleneck is memory bandwidth rather than compute power, data must be delivered to the GPU faster.

On the SSD front, Micron introduced a 245TB ultra-high-capacity SSD, which Werner described as "not much larger than a deck of cards." The significance lies not only in the capacity itself. Werner explained that currently deployed data center drives are typically around 30TB. A 245TB SSD means a drastic reduction in the number of devices needed for the same storage amount,连带 reducing network connections, power supplies, fans, and other supporting infrastructure. This ultimately compresses the storage footprint by over 80% while significantly lowering power consumption. "You only pay for the performance you actually need, and that performance is delivered with greater efficiency in gigabytes per watt," Werner said. This directly addresses the two most critical constraints for data centers today: power budgets and physical space. Werner stated, "If power is the bottleneck limiting growth, then we must find ways to deliver more efficient performance within a fixed power budget. This is the source of a great deal of our innovation."

**Capacity Can't Keep Up: Five Fabs Under Construction Globally** Despite strong demand, Werner坦承 that the memory industry's production capacity is already failing to keep pace. "We are not building enough fabs globally," he said directly. Micron is currently advancing the construction of five fabs simultaneously worldwide: a 600,000-square-foot cleanroom in Boise, Idaho (equivalent to 10 football fields); a newly announced fab in upstate New York; an expansion of an existing fab in Virginia; a groundbreaking for a Nanfab facility in Singapore; an expansion of DRAM production facilities in Japan; and the recent acquisition of a fab from PSMC in Taiwan. Werner indicated that the entire industry is currently constrained by cleanroom space, a situation unlikely to change in the short term. "We can no longer keep up with demand, and neither can anyone else—Intel, NVIDIA, TSMC are all saying they are at full capacity. Fabs don't just spring up overnight."

**The Market Hasn't Fully Grasped the Situation** Werner has a different perspective on market concerns. He believes that when the market sees cloud service providers (CSPs) significantly increasing capital expenditure, it starts to worry about sustainability. However, his judgment is that "these companies are undergoing a massive revolution, the potential of which still exceeds most people's imagination." Werner also pointed out that AI application scenarios are far from saturated. The training era has passed, the inference era is just beginning, and Agentic AI and Physical AI have not even truly scaled yet. "I truly believe we are only just scratching the surface of the transformation AI will bring." He also acknowledged a significant perception gap regarding AI inside and outside Silicon Valley: "In Silicon Valley, everyone is very excited, and it's easy to get caught in our own information bubble. But when I talk to friends not in the industry, many of them still haven't realized what will happen over the next 20 years."

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Memory Shortage Forces AI to Recompute from Scratch: Micron Executive Admits Industry Struggles to Meet Demand, Five Global Fabs Not Enough

Comments