The Beijing Academy of Artificial Intelligence (BAAI) recently announced a major technological breakthrough with its open-source unified software stack, FlagOS. Developed in collaboration with ecosystem partners, FlagOS has completed comprehensive end-to-end training validation across six manufacturers' AI chips, three types of large AI models (language, multimodal, and embodied), and five homogeneous and heterogeneous thousand-card clusters. This achievement marks the industry's first technical system to accomplish such results using a unified software stack, laying a critical foundation for the development of China's diverse AI computing ecosystem.
The validation process thoroughly tested and enhanced the unified ecosystem for diverse AI computing power. In terms of hardware adaptation, FlagOS successfully completed end-to-end training verification for six mainstream AI chips: Iluvatar CoreX, MetaX, Cambricon, Hygon, Moore Threads Technology Co.,Ltd., and Kunlun. During specified language model training, the performance of each chip was highly consistent with international mainstream platforms, enabling users to achieve equally high-quality training experiences on diverse hardware and significantly reducing reliance on single hardware systems.
In large-scale training, FlagOS achieved breakthroughs in both homogeneous and heterogeneous thousand-card cluster training. It completed end-to-end large model training on homogeneous thousand-card clusters using Hygon, MetaX, and Moore Threads Technology Co.,Ltd. chips. Additionally, efficient mixed training was demonstrated on two major heterogeneous thousand-card clusters combining MetaX with NVIDIA and Iluvatar CoreX with NVIDIA, fully validating the core capability of the unified software stack to support large-scale collaborative training across diverse AI computing resources.
Notably, the Hygon homogeneous thousand-card cluster achieved thousand-card training for a 32-billion-parameter multimodal large model, demonstrating extremely high system scalability and stability. The MetaX homogeneous thousand-card cluster achieved dual breakthroughs in both high performance and high precision across multiple large model trainings, reaching international advanced levels. Meanwhile, the Moore Threads Technology Co.,Ltd. homogeneous thousand-card cluster successfully completed full-process training and optimization for an embodied intelligence large model, verifying the feasibility and stability of domestic computing power in this field.
Comments