AI computing clusters are rapidly advancing toward scales of tens of thousands and even hundreds of thousands of cards, with high-speed interconnect networks serving as the "nerve hub" and becoming a critical factor influencing the effective release of cluster computing power. Currently, the high-speed interconnect network ecosystem for intelligent computing scenarios is primarily dominated by InfiniBand (IB) and RoCE v2, with core technologies and ecosystems long led by overseas manufacturers. As domestic AI computing infrastructure construction accelerates, homegrown high-speed interconnect network systems are entering a breakthrough period. Domestic RDMA high-speed interconnect networks are expected to penetrate from the underlying architecture and self-developed hardware, promoting the gradual localization of China's computing infrastructure at the interconnection level. It is recommended to focus on the domestic computing industry chain.
RDMA is an important technical pathway for achieving high-performance AI networks. The core characteristics of RDMA lie in "kernel bypass" and "zero-copy," enabling direct read and write operations on the memory areas of remote servers by bypassing the host operating system kernel and CPU scheduling. This significantly reduces communication latency and decreases CPU resource usage. Mainstream RDMA solutions currently include IB, RoCE, and iWARP. IB is a native network designed specifically for RDMA, offering optimal end-to-end lossless performance. RoCE applies the RDMA architecture within the Ethernet ecosystem, with RoCE v2 relying on flow control mechanisms such as priority-based flow control and explicit congestion notification for network optimization, simulating lossless transmission effects over traditional lossy Ethernet.
Amid the dominance of overseas manufacturers in the high-end interconnect network ecosystem, domestic high-speed RDMA interconnect networks are gradually achieving breakthroughs, accelerating the localization process. scale Fabric, introduced by Dawning Information Industry Co.,Ltd., is a domestically developed 400G native lossless RDMA high-speed interconnect network architecture. It adopts credit-based flow control and link-layer retransmission mechanisms consistent with InfiniBand, leveraging IB-like native RDMA network cards and switch chips to achieve 400Gb/s ultra-high bandwidth, end-side communication latency below 1 microsecond, and lossless transmission. Currently, as the network foundation for domestic ten-thousand-card intelligent computing clusters, scale Fabric has supported the large-scale deployment of the scaleX ten-thousand-card super cluster, potentially accelerating the deployment phase of large-scale domestic computing clusters.
Regarding investment targets, it is recommended to focus on Dawning Information Industry Co.,Ltd.. Risks include slower-than-expected maturity of the domestic computing interconnect ecosystem and potential delays in domestic computing infrastructure construction.
Comments