BAAI's FlagOS Achieves End-to-End Training Validation on Six AI Chips Including Moore Threads

Deep News03-27

The Beijing Academy of Artificial Intelligence (BAAI) recently announced a major technological breakthrough with its open-source unified software stack, FlagOS. Developed in collaboration with ecosystem partners, FlagOS has completed comprehensive end-to-end training validation across six manufacturers' AI chips, three types of large AI models (language, multimodal, and embodied), and five homogeneous and heterogeneous thousand-card clusters. This achievement marks the industry's first technical system to accomplish such results using a unified software stack, laying a critical foundation for the development of China's diverse AI computing ecosystem.

The validation process thoroughly tested and enhanced the unified ecosystem for diverse AI computing power. In terms of hardware adaptation, FlagOS successfully completed end-to-end training verification for six mainstream AI chips: Iluvatar CoreX, MetaX, Cambricon, Hygon, Moore Threads Technology Co.,Ltd., and Kunlun. During specified language model training, the performance of each chip was highly consistent with international mainstream platforms, enabling users to achieve equally high-quality training experiences on diverse hardware and significantly reducing reliance on single hardware systems.

In large-scale training, FlagOS achieved breakthroughs in both homogeneous and heterogeneous thousand-card cluster training. It completed end-to-end large model training on homogeneous thousand-card clusters using Hygon, MetaX, and Moore Threads Technology Co.,Ltd. chips. Additionally, efficient mixed training was demonstrated on two major heterogeneous thousand-card clusters combining MetaX with NVIDIA and Iluvatar CoreX with NVIDIA, fully validating the core capability of the unified software stack to support large-scale collaborative training across diverse AI computing resources.

Notably, the Hygon homogeneous thousand-card cluster achieved thousand-card training for a 32-billion-parameter multimodal large model, demonstrating extremely high system scalability and stability. The MetaX homogeneous thousand-card cluster achieved dual breakthroughs in both high performance and high precision across multiple large model trainings, reaching international advanced levels. Meanwhile, the Moore Threads Technology Co.,Ltd. homogeneous thousand-card cluster successfully completed full-process training and optimization for an embodied intelligence large model, verifying the feasibility and stability of domestic computing power in this field.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment