NVIDIA's New Platform Enhances Competitiveness in Agent Applications, AI Inference Drives Storage Cycle Upward

Stock News11:57

A research report indicates that NVIDIA showcased the Vera Rubin POD at GTC, with a focus on strengthening the competitiveness of its product lines in clustered computing and inference computing for Agent applications. In the era of AI advancement, model innovation and capital expenditure form the foundation, with collaborative development across the AI industry chain. AI inference is driving the storage cycle upward, with simultaneous efforts in capacity expansion and technological upgrades. It is recommended to monitor core beneficiaries within the industry chain.

The main points of the report are as follows: NVIDIA has launched the Vera Rubin POD platform. According to NVIDIA's official website, on March 16, 2026, NVIDIA showcased the Vera Rubin POD at GTC, which includes five new rack-scale systems specifically designed for Agentic AI workloads. As Agentic workloads place higher demands on high-throughput, ultra-low-latency inference, dense CPU sandboxing, and extensive context memory storage, NVIDIA has prioritized enhancing the competitiveness of its product lines for cluster computing and inference computing tailored for Agent applications.

The Vera Rubin POD primarily consists of two types of racks: (1) The MGXNVL rack, also known as the Vera Rubin NVL72, which is interconnected internally via NVLink and handles core GPU computing tasks. (2) The MGXETL rack, which includes the Groq3 LPX rack, the Vera CPU rack, the BlueField-4 STX storage rack, and the Spectrum-6 SPX networking rack. These racks collaborate through direct interconnects via Spectrum-X Ethernet or Groq3 LPU chips.

Based on schematic diagrams from the official website, a single Vera Rubin 1152 SuperPOD is composed of 16 Vera Rubin NVL72 racks, 2 Vera CPU racks, 10 Groq 3 LPX racks, 2 BlueField-4 STX storage racks, and 10 Spectrum-6 SPX networking racks. This composition reflects the heterogeneous, collaborative system architecture built around Agentic AI.

The Groq3 LPX rack is utilized to accelerate decoding. It integrates 256 LPU processors, equipped with 128 GB of on-chip SRAM and a bandwidth of 640 TB/s. Within the combined architecture of the Vera Rubin NVL72 and LPX racks, the GPU is primarily responsible for the Prefill phase and the Attention computation during the Decode phase. The LPU accelerates the FFN computation in the Decode phase, speeding up the decoding process for each output token per layer, and collaborates with the Vera Rubin racks via a customized Spectrum-X interconnect.

According to disclosures on NVIDIA's official website, under a condition of 400 TPS per user, the combination of the Vera Rubin NVL72 and LPX racks can achieve up to a 35-fold improvement in TPS per megawatt compared to the NVIDIA GB200 NVL72. This enhancement not only increases the overall system output but also makes it better suited for Agent application scenarios that require low latency and strong interactivity.

The Vera CPU rack provides support for the RL/Agent sandbox environment. It integrates 256 Vera CPUs and employs a high-density liquid cooling design. A single rack can support over 22,500 concurrent reinforcement learning or agent sandbox environments, which are used for testing, executing, and validating the output results from the Vera Rubin NVL72 and LPX systems.

Potential risks include slower-than-expected development and demand in the AI industry, lower-than-anticipated shipments of AI servers, and slower-than-expected technological and product progress from domestic manufacturers.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

NVIDIA's New Platform Enhances Competitiveness in Agent Applications, AI Inference Drives Storage Cycle Upward

Comments