What NVIDIA's Inference Context Memory Storage Means for NAND

Deep News01-15

Citigroup believes that the context memory storage technology adopted by NVIDIA in AI inference applications is expected to exacerbate the supply shortage in the NAND flash market. According to Zhui Feng Trading Desk, a recent Citigroup report indicates that NVIDIA's newly introduced Inference Context Memory Storage (ICMS) architecture will significantly boost demand for NAND flash, creating structural opportunities for memory chip manufacturers and potentially driving NAND prices even higher. It is advised to closely monitor changes in the supply and demand dynamics of the storage industry chain, as relevant manufacturers are expected to continue benefiting from this wave of demand growth. NVIDIA announced that its Vera Rubin platform will adopt the ICMS architecture powered by BlueField-4 chips, aiming to overcome memory bottlenecks and enhance AI inference performance by offloading KV Cache. This architecture requires an additional 1152TB of SSD NAND per server. The report forecasts that this will generate new demand equivalent to 2.8% and 9.3% of the global total NAND demand in 2026 and 2027, respectively. This move is set to further intensify the global NAND supply shortage while creating significant market opportunities for leading NAND suppliers such as Samsung Electronics, SK Hynix, SanDisk, Kioxia, and Micron Technology. ICMS: A Storage Bottleneck Solution for AI Inference The report points out that large-scale AI inference faces significant memory bottlenecks. The core memory optimization mechanism for Transformer models—KV Cache—stores computed key-value pairs to avoid redundant calculations and employs tiered storage based on performance and capacity needs: active KV cache resides in GPU HBM (G1), transitional/overflow KV cache is placed in system DRAM (G2), and hot KV cache is allocated to local SSDs (G3). To specifically optimize this architecture, NVIDIA introduced the Inference Context Memory Storage (ICMS) solution. This solution does not replace the existing storage hierarchy but adds a dedicated G3.5 tier for KV Cache between local SSD (G3) and enterprise shared storage (G4). This tier efficiently converts cold KV context data from G4 into warm KV cache in G2, working in concert with HBM to significantly improve data transfer efficiency and overall AI inference performance. In terms of hardware implementation, the Vera Rubin platform utilizes 16TB TLC SSDs as the ICMS storage medium, combined with a KV cache manager and a topology-aware scheduling mechanism, targeting three major performance breakthroughs: up to a 5x increase in tokens processed per second, up to a 5x improvement in energy efficiency, and lower latency. Specifically, each server is equipped with 72 GPUs, with each GPU corresponding to 16TB of dedicated ICMS NAND capacity, resulting in a total NAND demand of 1152TB per server. NVIDIA's introduction of context memory storage technology in AI inference marks a significant evolution in AI computing architecture. Unlike traditional training scenarios, the inference process relies heavily on the storage and rapid retrieval of large amounts of contextual data. This shift in technological approach opens up a new application scenario for NAND flash memory, which is expected to become a major demand growth driver following data centers and smartphones. Clear Incremental NAND Demand, Deepening Supply Shortage After scenario analysis and calculations, Citigroup believes that the large-scale adoption of the ICMS architecture will bring significant and certain incremental demand to the global NAND market. The report forecasts that Vera Rubin server shipments will reach 30,000 units in 2026, corresponding to an ICMS-driven NAND demand of 34.6 million TB (equivalent to 34.6 billion 8Gb equivalents), accounting for 2.8% of the global total NAND demand for that year. As AI inference demand further expands, Vera Rubin server shipments are expected to increase to 100,000 units in 2027, at which point the NAND demand from ICMS will surge to 115.2 million TB (equivalent to 115.2 billion 8Gb equivalents), raising its share of global NAND demand to 9.3%. The report also notes that the global NAND market is already in a state of supply tightness. The explosive growth of the AI industry in recent years has continuously pushed up data storage demand, making the supply-demand balance for NAND as a core storage medium quite fragile. The new demand brought by NVIDIA's ICMS architecture is characterized by its strong rigidity and large scale, which will directly disrupt the existing supply-demand equilibrium and further exacerbate the global NAND supply shortage. AI-Driven Acceleration in NAND Market Upgrades Citigroup views the launch of NVIDIA's ICMS architecture not as an isolated technological innovation, but as an inevitable result of the deep integration of AI technology and the storage industry—a trend that will profoundly influence the future development of the NAND market. The report states that against the backdrop of expanding large model inference scenarios and continuously growing computational scale, the performance, capacity, and energy efficiency of storage systems have become key factors determining the AI application experience. This will accelerate the iterative upgrade of NAND technology towards higher density, faster read/write speeds, and lower power consumption. Furthermore, the report predicts that innovative explorations in AI-native storage architectures will open up new growth avenues for the NAND industry. Beyond the current ICMS architecture, more customized storage solutions tailored for specific AI scenarios are likely to emerge in the future, continuously unleashing the demand potential for NAND. The report also mentions that the incremental demand from the ICMS architecture will not only benefit NAND manufacturers but will also ripple upstream through the supply chain, promoting synergistic development in areas such as SSD manufacturing and storage controllers, thereby injecting new growth momentum into the entire semiconductor industry chain.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment