Moore Threads Technology Co.,Ltd., in collaboration with the Beijing Academy of Artificial Intelligence (BAAI), has successfully completed the full-cycle training of the self-developed embodied brain model RoboBrain 2.5. This achievement was realized using the FlagOS-Robo framework and leveraging the MTT S5000 thousand-card AI computing cluster. This marks the industry's first successful validation of the usability and high efficiency of a domestic computing cluster for training large embodied AI models, representing a critical step forward for domestic AI infrastructure in handling complex multimodal tasks.
As embodied intelligence emerges as the next strategic frontier in artificial intelligence, the autonomy and controllability of the underlying computing foundation have become particularly crucial. Recently, Moore Threads, jointly with the Beijing Academy of Artificial Intelligence, completed the full-process training of BAAI's self-developed embodied brain model, RoboBrain 2.5, based on the FlagOS-Robo framework and utilizing the MTT S5000 thousand-card intelligent computing cluster. This is the industry's first verification of the availability and efficiency of a domestic computing cluster in training large embodied AI models, signifying a key advancement for domestic AI infrastructure in tackling complex multimodal challenges.
Through the efficient collaboration between FlagOS, a unified AI system software stack designed for diverse chips, and the MTT S5000 hardware cluster, this solution not only proves capable of training but also achieves "stable and fast training," providing a solid foundation for moving embodied intelligence from the laboratory to industrial application. RoboBrain is a general-purpose embodied brain developed by BAAI for real-world physical scenarios. It utilizes a unified vision-language multimodal architecture to provide fundamental support for a robot's core capabilities in perception, cognition, reasoning, and decision-making. Building upon the original general embodied brain, RoboBrain 2.5 adds the robot's ability to understand and reason about the temporal value assessment of actions and three-dimensional spatial structures, significantly improving the success rate of downstream task execution.
FlagOS-Robo is an integrated framework for embodied AI training and inference, built upon the open-source, multi-chip AI software stack FlagOS. It supports multi-scenario deployment from edge to cloud, is compatible with various chips, and enables efficient co-training and inference of both the brain model (VLM) and the cerebellum model (VLA). FlagOS-Robo streamlines the entire pipeline from data collection to real-machine testing and evaluation, covering the full process from data loading and model training to inference and embodied evaluation, effectively reducing development complexity. It supports multiple chips and offers features like unified experiment management and automated multi-chip tuning, enabling one-click cross-platform deployment. Through this complete ecosystem, FlagOS-Robo will provide powerful computational support and systematic backing for cutting-edge research and industrial applications in embodied intelligence, accelerating the innovation and deployment of AI technologies.
Multi-dimensional evaluation validates comprehensive alignment of metrics. To test the model's algorithmic effectiveness, the BAAI team conducted validation on several authoritative embodied evaluation datasets, including the 2D/3D spatial perception and reasoning leaderboard and the temporal value assessment leaderboard. The results show that the RoboBrain-2.5 model, trained on the domestic MTT S5000 thousand-card cluster, achieved performance parity with models trained on mainstream international GPUs across multiple key metrics. Particularly on tasks like CrossPoint, Q-Spatial, and VABench-V, the algorithmic performance was even superior. These comprehensively aligned evaluation results indicate that the "embodied brain" trained collaboratively by the FlagOS-Robo framework and MTT S5000 computing power has reached industry-leading standards in understanding, planning, and execution capabilities.
Loss perfectly aligned, with an error margin of less than 0.62%. In terms of model precision, the MTT S5000-based Kua E computing cluster demonstrated extremely high stability. The training curves reveal that the Loss trajectory on the MTT S5000 thousand-card cluster highly coincides with the training results from mainstream international GPUs, with a relative error of less than 0.62%. This minimal error not only confirms the training accuracy of the domestic computing power but also signifies that the BAAI FlagOS-Robo framework has successfully achieved lossless migration across platforms. Developers need not worry about performance degradation due to hardware changes, achieving a smooth adaptation that requires "no code changes and no loss of precision."
Extreme linear scaling, with thousand-card acceleration ratio exceeding 90%. The core of large-scale cluster training lies in efficiency. Measured training data from this exercise shows that the Moore Threads MTT S5000 thousand-card intelligent computing cluster exhibited high scalability: when scaling from 64 cards to 1024 cards, the system achieved a linear scaling efficiency of over 90%. The scaling curve demonstrates an excellent linear growth trend, meaning that as computing resources increase, the training speed almost doubles synchronously. This fully proves the maturity of the domestic cluster in large-scale parallel computing and communication scheduling, and demonstrates its capability to support ten-thousand-card level training.
This in-depth collaboration between Moore Threads and BAAI will further accelerate the process of bringing embodied intelligence from the lab to industrial application. It provides the industry with a replicable and scalable "domestic computing power training paradigm," offering China's embodied AI industry an autonomous, open, and highly efficient computing foundation.
Comments