Super Nodes Enhance Computational Efficiency, Domestic Computing Power May Seize Overtaking Opportunity

Stock News04-12

Guolian Minsheng Securities released a research report stating that the non-linear growth in Token demand during the AI Agent era may directly lead to unexpectedly high AI computing power requirements. Super nodes are expected to become a significant trend in AI computing development. The rapid advancement of domestic super nodes presents a crucial opportunity for local computing power to achieve a "corner overtaking" advantage. The report recommends focusing on the following areas: 1) Leading domestic super node companies: Inspur Information (000977.SZ) and Sugon (603019.SH); 2) Huawei's super node industry chain: iSoftStone (301236.SZ), Digital China (000034.SZ), China Greatwall (000066.SZ), Huibo Yunton (301316.SZ), and TOWAY (002261.SZ); 3) Domestic AI chips/CPUs: Cambricon (688256.SH), Hygon (688041.SH), China Greatwall, Intellifusion (688343.SH), and Loongson Technology (688047.SH); 4) Cloud computing: KINGSOFT CLOUD (03896), Wangsu Technology (300017.SZ), UCloud (688158.SH), and QingCloud (688316.SH).

The main viewpoints of Guolian Minsheng Securities are as follows: AI development is driving innovation in computing architecture, with super nodes enhancing computational efficiency. AI computing differs from traditional data center operations, functioning as a continuously online intelligent production system. Its core performance depends on inference, context processing, and data movement efficiency, rather than just peak server computing power. AI workloads require multi-step reasoning within ultra-long contexts, placing pressure on platform capabilities across all layers. Minor efficiency losses can significantly impact cost, throughput, and competitiveness at the scale of trillions of tokens. Progress in AI computing can be reflected by three scaling laws: pre-training scaling enables models to learn inherent knowledge, post-training scaling grants models thinking abilities through fine-tuning and reinforcement learning, and test-time scaling achieves deep reasoning by generating more tokens during inference.

Large model auto-regressive inference involves two conflicting resource demand phases: Prefill (compute-intensive) and Decode (memory bandwidth-intensive). Super nodes can provide crucial support for achieving P/D separation, making them a potential core form of next-generation AI computing architecture. During the Decode phase, the key performance factor shifts from GPU peak computing power to the total amount of data a GPU can read from or write to its memory per unit time. This performance directly affects user experience metrics like single-token generation latency, which determines the fluency of subsequent text generation. Consequently, the P/D separation architecture has emerged. New super node server architectures enable efficient physical separation, utilizing powerful internal interconnection networks to divide P/D tasks internally.

Simultaneously, advancements in interconnection protocols are enabling more effective utilization of physical bandwidth capabilities. For example, the OISA protocol promoted by China Mobile represents how new-generation interconnection technologies are evolving beyond mere "data pipelines" to become active participants in system management. Taking NVIDIA's Rubin platform as an example, extreme co-design forms its foundation. GPU, CPU, networking, security, software, power supply, and cooling are all built as a coordinated system rather than being optimized independently. This approach treats the entire data center as the computational unit, establishing a new foundation for efficient, secure, and predictable large-scale intelligent generation. It ensures sustained performance and efficiency in actual production deployments, not just in isolated component benchmarks.

The flagship product of the Rubin platform is the Vera Rubin NVL72 rack-scale system, designed to operate as a coordinated machine within larger AI factories. The NVL72 system is optimized not just for peak performance but for sustained intelligent production, featuring predictable latency, high utilization across heterogeneous execution phases, and efficient power conversion into usable intelligence. The Rubin platform is built from six new chips, each designed for specific roles in AI factories and intended to operate as part of a unified rack-scale system from the outset.

Inspur Information's YuanNao SD200 super node is one of the strongest domestic AI super node products for large model inference performance. When running the DeepSeek R1 671B large model with 64 domestic AI chips, the SD200 achieves a single-user token generation speed of 112 tokens/s with an input length of 4096 and output length of 1024. Its single-token generation latency is as low as 8.9ms, making it the first domestic super node product to break the 10ms barrier, leading the industry in end-to-end large model inference experience. The product features native hardware architecture innovation, utilizing a self-developed multi-host low-latency memory semantic communication architecture and a 3D Mesh high-performance interconnection super-expansion system that supports high-density expansion of 64 domestic AI chips.

Sugon has launched the world's first cableless box-type super node, scaleX40. It employs an orthogonal cableless primary interconnection architecture, enabling direct plugging between compute nodes and switch nodes, eliminating performance loss and maintenance risks associated with cables. A single scaleX40 node integrates 40 GPUs, with total computing power exceeding 28 PFLOPS (FP8 precision), total HBM memory exceeding 5TB, and total memory access bandwidth exceeding 80TB/s, forming a high-density computing unit suitable for training and inference of trillion-parameter large models.

Huawei's Ascend portfolio ranges from 384-node super nodes to 10,000-card clusters, establishing a solid computing foundation as a domestic computing power leader. The Atlas 900 AI super node, equipped with 384 Ascend 910C AI chips, delivers 300 PFLOPS of FP8 precision computing power using the self-developed Lingqu 1.0 all-optical interconnection protocol. It is currently a mainstream computing product for domestic AI computing center construction and industry large model training scenarios. The Atlas 950 AI super node is a flagship 10,000-card level super node designed for training and inference of trillion-parameter models, equipped with 8,192 Ascend 950DT AI chips. The Atlas 960 AI super node is an ultra-large-scale flagship product for AGI scenarios, equipped with 15,488 Ascend 960 AI chips. The TaiShan 950 general computing super node targets general computing scenarios like finance and government affairs, supporting up to 32 Kunpeng 950 general-purpose processors.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Super Nodes Enhance Computational Efficiency, Domestic Computing Power May Seize Overtaking Opportunity

Comments