AI Inference Drives Storage Demand Surge, CITIC SEC Maintains Positive Outlook on Storage Innovation Growth

Stock News03-30

CITIC SEC has released a research report stating that in the era of Agent AI, storage capacity is the core driver, leading to a long-term paradigm shift in the storage industry. On the supply and demand front, AI inference is causing a dramatic increase in Token consumption, with KV Cache rising linearly alongside it. The mismatch between exploding demand and limited production expansion by original manufacturers has led to normalized shortages, which are expected to persist until at least 2027, with price increases continuing throughout 2026. Technologically, against the backdrop of extreme shortages and high costs for HBM and DRAM, manufacturers are promoting NAND innovation solutions to share the pressure from graphics memory capacity demands. The firm continues to hold a positive view on the growth trend driven by storage innovation.

Key points from CITIC SEC are as follows: The 2026 China Flash Market Summit was held, focusing on storage innovation and supply chain upgrade opportunities in the AI era. On March 27, 2026, the global storage industry's annual flagship event, CFMS MemoryS 2026, took place in Shenzhen. As a bellwether industry summit, this year's event centered on the theme "Navigating Cycles, Unlocking Value," deeply focusing on technological innovation and collaborative supply chain upgrades. It attracted dozens of leading global companies including Samsung Electronics, Silicon Motion, Kioxia, Solidigm, Intel, and Tencent Cloud, covering the entire industrial chain from storage chip manufacturers and controller design to module manufacturing and cloud services. The summit featured high-level forums alongside a technical exhibition. Discussions involved outlooks on industry trends, concentrating on the explosion in storage capacity demand driven by soaring token/KV Cache usage in the Agent AI era. The event featured forward-looking discussions on breakthroughs in PCIe 5.0/6.0 SSD, ultra-high-capacity QLC technology, and other AI-driven storage innovations, simultaneously showcasing over a hundred innovative products.

AI inference is driving a surge in storage demand, with structural mismatches becoming the norm. Supply shortages are projected to last until at least 2027, with price hikes continuing through 2026. On the demand side: According to CFM (China Flash Market) data, 2026 server shipments are expected to increase by 15% year-over-year, with AI servers accounting for over 20% of total server shipments. As large models transition from the training phase to the inference phase, the explosion of Agent applications leads to a sharp rise in Token consumption. When the sequence length increases from 1k to 128k tokens, KV Cache occupancy surges from 0.5GB to 64GB (BF/FP16, single request). Under conditions of long context and high concurrency, storage demand increases linearly with token count and concurrency. CFM predicts HBM capacity will grow by over 90% in 2025 and over 35% in 2026 year-over-year. Concurrently, the offloading of KV Cache combined with HDD supply shortages is driving demand spillover, making eSSD the largest downstream segment for NAND in 2026 (share rising to 37%).

On the supply side: Misaligned expansion cycles mean shortages and price increases will persist long-term. Storage manufacturers are widely adopting strategies to stabilize prices, prioritizing advanced production capacity for high-margin AI storage products. According to CFM, the proportion of more advanced DRAM capacity (including HBM, DDR5, LPDDR5X/6) will increase from less than 50% in 2024 to over 85% in 2026, continuously squeezing mature process and consumer-grade capacity. Industry inventory has decreased from 10-12 weeks in 2023 and 8-10 weeks in 2024 to just 4 weeks in 2026, falling below the historical safety line. Storage capacity expansion cycles are lengthy, taking 18-24 months, making a supply inflection point in the second half of 2026 unlikely. Silicon Motion suggests that 2027 may be the "darkest hour" for storage shortages. Storage prices began a historic rise in the second half of 2025, and CFM forecasts that DRAM and NAND ASP will continue to increase throughout 2026. In the AI inference era, storage capacity is paramount, signaling a long-term paradigm shift towards super growth, not merely a cyclical rebound.

The storage industry chain is accelerating its value restructuring. At the recent GTC conference, NVIDIA emphasized "Token Factory Economics," which fundamentally reinforces the strategic position of storage within AI infrastructure and implies that the profit ceiling for the storage industry will be lifted for the long term. According to CFM data, the ASP of eSSD products in Q1 2026 was already more than double that of consumer-grade NAND ASP. For storage manufacturers, the core focus is on media upgrades and system-level architectural redesign, with forum presentations primarily concentrating on the enterprise market. For storage solution providers, the industry focus is shifting from "who is cheaper" to "who can secure supply." Meanwhile, leading companies like Phison Electronics are accelerating their transformation towards "customized high-value modules" empowered by self-developed controllers and expanding into enterprise SSDs to redefine storage value and break away from the traditional model reliant on low-cost inventory.

AI Cloud (Enterprise) Storage Trends: Large-Capacity QLC Boom and Ultra-Fast Interface Evolution Reshape the Compute Engine. AI is rapidly transitioning from "training" to "inference," with the future ratio of inference to training servers projected to be as high as 10:1 to 50:1. Currently, constrained by storage bandwidth bottlenecks, the utilization rate of GPU clusters is only about 46% to 50%. Graphics memory upgrades are a core demand. Additionally, multiple manufacturers at the summit discussed the functional reallocation enabled by storage-compute collaboration. The role of eSSD is evolving from a "passive data container" to a core "compute engine" and "extended memory layer." On the training side, leveraging ultra-high-capacity QLC eSSD to store checkpoints can significantly improve GPU operational efficiency. On the inference side, eSSD handles tasks like massive context state management, vector database queries, and model shard loading by tiering and caching KV Cache. Test data shows that offloading KV cache to SSD can reduce Time-To-First-Token (TTFT) by 41 times by eliminating prefill computation.

Enterprise storage is exhibiting the following technical trends: To meet the massive caching and spillover demands of AI data and KV Cache, high-density QLC has become a key medium, with hundred-terabyte-class ultra-high-capacity QLC solutions becoming the preferred choice. Kioxia (245.76TB), Dapu Micro (245TB), and SanDisk (SN670 solution up to 256TB) also showcased QLC products breaking the 200TB barrier, greatly optimizing space efficiency and TCO. Controller chips are moving towards "hardware-software co-design" to compensate for media limitations. To address the high-frequency random read/write and bandwidth pressures from KV Cache in inference scenarios, controller chips are undergoing proactive upgrades. T-Head's Zhenyue 510, with native support for ZNS protocol and system-level collaboration, aids in the large-scale commercialization of QLC, with cumulative shipments exceeding 500,000 units. Union Memory introduced technologies like a KV acceleration engine and predictive prefetching, transforming the controller from a "data mover" into an active "intelligent resource scheduler." Interface speeds are iterating rapidly alongside liquid cooling innovations to adapt to massive 10,000+ GPU clusters. Samsung showcased a 16-lane PCIe 6.0 SSD, the PM1763, with I/O performance leaping 2.0 times higher. FADU's PCIe Gen6 controller "Lhotse" has been taped out, promising sequential read performance of up to 28.5 GB/s.

AI Terminal (Consumer) Storage Trends: On-Device AI Accelerates Deployment, Memory-Compute Fusion Alleviates Memory Bottlenecks. The on-device environment imposes strict constraints on hardware BOM cost, system power consumption, and DRAM memory usage. Consequently, shifting inference pressure from memory (DRAM) to flash storage (NAND) through "memory-compute fusion," intelligent hardware-software scheduling, and advanced caching technologies has become a crucial supplementary approach to overcoming deployment bottlenecks for large models on devices. AI PC and Local Large Models: Hybrid technologies mitigate the pressure from surging DRAM capacity requirements. Running billion or trillion-parameter models locally on devices poses a significant challenge to memory. Longsys introduced a storage processing unit with a 5nm SPU and an iSA Storage Agent. In joint optimization tests, it enabled local deployment of a 397B model on a PC host and reduced DRAM usage by nearly 40% in a 256K context scenario. Phison Electronics launched its Phison Hybrid AI SSD and aiDAPTIV+ technology, expected to reduce DRAM usage by over 50%, enabling cost-effective and secure local inference.

Smart Vehicles and Edge Computing: Moving towards Centralized Pooled Architectures and Unified Platform Foundations. Embodied AI and advanced autonomous driving require global coordination at the underlying architecture level. XPeng Motors explicitly stated that with computing power now reaching up to 2250 TOPS, DRAM bandwidth has become a core bottleneck for inference latency. The era of automotive-grade LPDDR6 is approaching, and automotive NAND storage is transitioning from isolated domain-specific islands to centralized pooling and software-defined architectures. Smartphones and AIoT: Deep Integration of High-Speed Interfaces and Advanced Caching Technologies. To meet the response speed and battery life demands of mobile devices and emerging wearables, Silicon Motion is set to launch its new-generation UFS 4.1 controller, the SM2755, and is accelerating its presence in AIoT markets like smartwatches and glasses. SanDisk employs SmartSLC caching technology to achieve high throughput under UFS 4.1 with power consumption of only about 2W. Longsys is promoting the adoption of HLC advanced caching technology in embedded devices to reduce terminal BOM costs.

Risk factors include potential global macroeconomic downturn; weaker-than-expected downstream demand; slower-than-anticipated innovation; changes in the international industrial environment and escalating trade frictions; delays in computing power upgrades; and cloud service providers' capital expenditure falling short of expectations.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

AI Inference Drives Storage Demand Surge, CITIC SEC Maintains Positive Outlook on Storage Innovation Growth

Comments