At the 2026 International Consumer Electronics Show (CES) on January 5th, NVIDIA CEO Jensen Huang unveiled new hardware called the "Inference Context Memory Platform" (ICMS), designed to address the explosive growth in data storage demands during the AI inference phase. This move signals a shift in the focus of AI hardware architecture from pure computational power stacking towards efficient context storage, with NAND flash and SSDs poised to take over from HBM as the next critical growth engine.
A January 24th article in The Korea Economic Daily detailed that during his presentation, Jensen Huang showcased a mysterious black rack dubbed the "Inference Context Memory Platform" (ICMS). This is not a routine hardware update but a pivotal innovation aimed at solving the data bottleneck in the AI inference stage. The reporter astutely observed that this could represent the next major breakout point for the storage industry, following HBM (High Bandwidth Memory).
The core logic of this platform lies in solving the "KV Cache" (Key-Value Cache) problem in AI inference. As AI transitions from pure learning phases to large-scale inference applications, data volumes are exploding, and existing GPU memory and server memory architectures are struggling to keep up. By introducing new Data Processing Units (DPUs) and massive SSDs (Solid State Drives), NVIDIA is constructing a vast cache pool to break through this physical limitation.
This technological shift presents a significant positive development for South Korean storage giants Samsung Electronics and SK Hynix. The report suggests that with the adoption of ICMS, NAND flash is set to enter a "golden age" similar to that of HBM. This implies not just a surge in demand for storage capacity but also foreshadows a fundamental change in storage architecture—GPUs might potentially bypass CPUs to communicate directly with storage devices at high speeds.
The Korean media article points out that the core motivation behind Jensen Huang's introduction of ICMS technology is the explosive growth of "KV Cache." In the era of AI inference, KV Cache is crucial for AI to understand conversational context and perform logical reasoning. For instance, when a user asks an AI a complex, subjective question about G-Dragon, the AI needs to call upon its internal model data and historical conversation context (the KV Cache) to assign weights and perform inference, thereby avoiding redundant calculations and hallucinations.
As AI shifts from pure learning to inference, and application scenarios expand into multimodal domains, the volume of data requiring processing is experiencing irregular and explosive growth. NVIDIA has found that expensive HBM or conventional DRAM alone can no longer accommodate the massive KV Cache, and existing server internal storage architectures are insufficient for the future inference era. Consequently, a dedicated storage platform capable of handling vast data volumes while maintaining efficient access has become a critical necessity.
According to the Korean media article, the core of the ICMS platform lies in the combination of DPUs with ultra-high-capacity SSDs. The article paraphrases NVIDIA's explanation, stating that the platform utilizes the new "BlueField-4" DPU, which acts as a "logistics officer" for data transfer, alleviating the CPU's burden. A standard ICMS rack contains 16 SSD bays; each bay is equipped with 4 DPUs and manages 600TB of SSD, resulting in a staggering total capacity of 9,600TB per rack.
This capacity far exceeds that of traditional GPU racks. In comparison, a full VeraRubin GPU platform comprising 8 racks has a total SSD capacity of approximately 4,423.68TB. Jensen Huang stated that through the ICMS platform, the virtual available memory capacity for GPUs has been increased from the previous 1TB to 16TB. Simultaneously, leveraging the performance enhancements of BlueField-4, the platform achieves a KV cache transfer speed of 200GB per second, effectively addressing the bottleneck issues in network transmission for high-capacity SSDs.
The article notes that the ICMS platform primarily utilizes SSDs, which directly benefits NAND flash manufacturers. Over the past few years, despite the AI boom, the spotlight has been mainly on HBM, while NAND flash and SSDs have not received equivalent attention.
NVIDIA positions this platform as a "Tier 3.5" storage layer, situated between a server's internal local SSDs and external storage. Compared to expensive and power-hungry DRAM, SSDs managed by high-performance DPUs offer advantages of large capacity, high speed, and data persistence during power loss, making them an ideal choice for storing KV Cache.
This architectural change directly benefits Samsung Electronics and SK Hynix. Due to the extremely high storage density requirements of ICMS, market demand for enterprise-grade SSDs and NAND flash is expected to climb significantly. Furthermore, NVIDIA is advancing its "Storage Next" (SCADA) initiative, aimed at enabling GPUs to access NAND flash directly, bypassing the CPU, thereby further eliminating data transfer bottlenecks.
SK Hynix has already responded swiftly to this trend. Reports indicate that SK Hynix Vice President Kim Cheon-seong revealed the company is collaborating with NVIDIA to develop a prototype product named "AI-N P." Plans are to launch a storage product supporting 25 million IOPS (Input/Output Operations Per Second) by the end of this year, utilizing the PCIe Gen 6 interface, with expectations to boost performance to 100 million IOPS by the end of 2027. As major manufacturers accelerate their deployments, NAND flash and SSDs are anticipated to enter a new cycle of volume and price increases in the AI inference era.
Comments