JD.com Mobilizes Massive Workforce to Amass Embodied Intelligence Data

Deep News03-18 16:01

On March 16, JD.com announced the establishment of what it claims to be the world's largest and most comprehensive embodied intelligence data collection center. This move places a strong emphasis on the robotics sector, which had recently been overshadowed by other trends.

This initiative can be viewed as a large-scale data production campaign with distinct industrial internet characteristics. The mobilization involves over 100,000 internal employees and up to 500,000 external participants from various industries, including more than 100,000 residents in Suqian alone. This unprecedented "human wave" strategy aims to overcome the most critical weakness in embodied intelligence today: the data shortage, through a form of scaled, forceful execution.

As model architectures become more standardized and computing power requirements more transparent, high-quality physical interaction data has emerged as the decisive factor determining whether robots can be successfully integrated into countless industries. Dubbed "the largest data collection operation in human history," this effort underscores an industry consensus: while the "cerebellum" responsible for motion control in embodied intelligence is becoming more advanced, the core challenge now is feeding it high-quality data to develop a "brain" that truly understands the physical world. This is becoming the pivotal battle that will shape the industry's future.

Transitioning from JD.com's grand vision to the practical realities of the industry, it remains uncertain whether the data generated by these hundreds of thousands of participants will prove to be a goldmine or merely sand.

**The Involved Workforce** JD.com's ability and necessity to launch this massive data campaign stem from its vast and highly complex self-operated physical supply chain. Unlike pure software internet companies, JD.com operates as a massive interactive field within the physical world. The maturity of embodied intelligence is directly tied to its fulfillment costs and operational efficiency over the next decade.

This strategy is deeply integrated with the robotics industry ecosystem in Beijing's Yizhuang Economic-Technological Development Area, which hosts over 300 robotics-related companies, boasts an industrial chain scale exceeding ten billion yuan, and has opened more than 40 real-world application scenarios, making it a core hub for humanoid robotics in China. As an anchor enterprise in Yizhuang, JD.com had previously launched a robotics industry acceleration plan. By heavily investing in soft infrastructure like the data collection center, JD.com is addressing a critical missing link in the industrial chain. Yizhuang provides the "body" and testing grounds, while JD.com aims to use its massive array of scenarios to instill common-sense understanding of the real world into robots.

This synergy between software and hardware attempts to create a commercial closed loop, from a data flywheel to iterative hardware improvements. Coordinating hundreds of thousands of people is no small feat. The plan covers scenarios in logistics, industrial settings, and retail. In practice, this will likely rely on JD.com's existing digital management network, potentially involving delivery personnel and warehouse workers wearing wearable devices equipped with visual or even force sensors during their daily tasks.

From the perspective of frontline employees and mobilized Suqian residents, the campaign is complex. Employees inadvertently become data tutors for robots whose future purpose may be to replace high-intensity human labor. Designing fair compensation, incentives, and benefit-sharing mechanisms to avoid employee resistance is a challenge JD.com must address. However, specific implementation details have not yet reached many employees. A JD.com employee in Beijing stated they had not heard about the initiative, suggesting that with appropriate payment, participation would be a matter of personal choice. An employee in Suqian also reported not receiving any notification.

Despite JD.com's announcement emphasizing that "all data collection will strictly comply with laws and regulations," the reality is often more complicated. In logistics, while warehouse operations are standardized, last-mile delivery involves countless households, and retail scenarios capture consumer facial features and private habits. In an era of increasing data regulation, the compliance costs for anonymizing and cleaning vast amounts of unstructured data collected by hundreds of thousands could be astronomical.

**Addressing Moravec's Paradox** In 1988, roboticist Hans Moravec concluded: "It is comparatively easy to make computers exhibit adult-level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility." Today, the primary reflection of Moravec's Paradox in embodied intelligence is the industry's data vacuum.

Large language models succeeded by digesting trillions of high-quality text data points accumulated over thirty years of the internet. However, no equivalent repository exists for the physical world. For embodied intelligence to scale effectively in the real world, it must overcome a massive data barrier. JD.com's large-scale effort targets this very challenge and the underlying difficulties of data collection.

First, the limitations of simulation need to be addressed. The industry's primary methods for data acquisition have diverged, each grappling with its own bottlenecks. Most startups heavily rely on simulation environments like NVIDIA's Isaac Sim or physics engines like MuJoCo, where robots undergo millions of reinforcement learning trials virtually. This approach is low-cost, fast, and avoids hardware damage from errors. However, experienced practitioners increasingly recognize the limitations of "Sim-to-Real" transfer. The complexity of the physical world lies not only in visual changes like lighting but also in subtle physical feedback—the flexible deformation of cables, the non-rigid stretching of fabric, minute friction variations when tightening a screw, or electromagnetic noise from sensors themselves. Current physics engines lack the computational power to perfectly simulate these high-dimensional, nonlinear micro-physical laws, leading to models that perform flawlessly in simulation but suffer from severe functional failures or distorted movements when deployed on real hardware.

Given the simulation gap, the focus returns to the real world. From Stanford's popular Mobile ALOHA to leading companies like Figure AI, Unitree, and Zhiyuan, many are extensively using teleoperation. This involves humans wearing motion capture suits or using VR devices to control robots like avatars, recording first-person visual data, joint angles, and torque data. This method is currently considered the highest quality for data acquisition. However, it encounters the second major commercial problem: extremely poor cost-effectiveness. Industry estimates suggest a single full-size humanoid robot can cost hundreds of thousands, even millions. Teleoperation incurs not only high hardware depreciation but also expensive costs for specialized operators. Acquiring and cleaning data for a single high-quality complex interaction task can cost hundreds of dollars, with a high failure rate. This artisanal, hand-crafted data production model cannot support the billions or trillions of parameters required for general-purpose embodied intelligence.

To lower the barrier, giants like Google have initiated open-source data set projects like Open X-Embodiment, pooling data from global labs for industry-wide use. Some domestic companies have also open-sourced millions of real-robot data points. But this reveals another major engineering challenge: the extreme fragmentation of robot hardware platforms. Quadrupeds, wheeled robots, bipedal humanoids, and even humanoids from different manufacturers have varying degrees of freedom, motor torque, sensor placement, and center-of-mass structures. High-quality grasping data trained on a UR5 robotic arm cannot be directly transferred to a Tesla Optimus or a JD.com logistics robot. The difficulty of "cross-platform mapping" means most existing open-source data remains in isolated silos, unable to achieve scale.

Perhaps due to these three challenges, the competitive logic in embodied intelligence has fundamentally shifted. Whoever possesses real-world deployment scenarios holds a moat for continuously acquiring cheap, high-quality, closed-loop data. This explains why Tesla and JD.com have chosen paths different from pure hardware startups. Tesla leverages its vast Gigafactories for Optimus to trial and error on real battery sorting lines. JD.com aims to use its nationwide logistics network, hundreds of thousands of industrial workers, and extensive retail system to create a semi-automated data pipeline. This strategy converts a company's supply chain barriers into data barriers for the AI era.

In contrast, many robotics startups without their own scenarios are forced to adapt. They either sell hardware at a loss to universities and research institutes to gain access to shared usage data, or spend heavily to rent factory space or hire specialized embodied intelligence data service providers like Jianzhi for custom data collection. JD.com's entry has effectively pulled back the algorithmic curtain on the embodied intelligence industry, dragging it into a capital-intensive, scenario-dependent, and labor-management-heavy commercial battle. Facing the data drought, the moat provided by algorithms is shrinking, while giants controlling the gateways to real-world physical interaction are quietly weaving the net that leads to AGI.

**The Scarcity of High-Quality Data** Regarding JD.com's plan to "accumulate over 10 million hours of real-scenario data within two years," industry insiders reacted not with uniform enthusiasm but with cautious scrutiny. In embodied intelligence, the quality and modality of data are far more critical than mere duration.

Algorithm experts point to the core pain point: the current shortage is not first-person video from a human perspective, but "state-action pairs" containing precise physical feedback. For instance, Suqian residents recording supermarket visits or delivery personnel documenting their routes generate massive internet-scale generalized visual data. While valuable for training a robot's world model to understand concepts like "door" or "apple," this purely visual data is almost useless for training a robot's "control policy"—teaching it the exact force in Newtons required to pick up an apple without crushing it.

An individual in the robotics industry stated that the real shortage is valuable data, particularly real-robot data. From this perspective, JD.com's initiative resembles a Business Process Outsourcing operation, providing personnel and locations. When humans perform physical grasping, it involves complex tactile, force, and proprioceptive adjustments—high-dimensional tacit knowledge that ordinary wearables cannot capture. If JD.com's massive workforce only contributes video, the conversion rate of this data into executable robot actions would be astonishingly low.

Another executive from a leading domestic robotics firm highlighted the primary industry challenge: "the lack of a unified data set definition standard." For example, each robotics company has different joint degrees of freedom, sensor positions, and actuator types. How can the vast amount of human motion data collected by JD.com be remapped to robots with different configurations? Without unified underlying standards, the 10 million hours of data might only nourish JD.com's proprietary robots, failing to become infrastructure that advances the entire industry.

This may explain why JD.com's first-year plan specifically emphasizes collecting "1 million hours of robot本体 data." The true path for industry development likely involves pre-training with generalized human video for world cognition, fine-tuning with high-quality robot-specific data for skill learning, and reinforcement learning for self-exploration and evolution.

JD.com's announcement of building an embodied intelligence data collection center signals that domestic companies are beginning to tackle the robotics industry's data shortage with scaled, engineering-driven approaches. Combining physical scenarios with massive human resources offers a new path for data accumulation. However, achieving true "intelligent emergence" in robots requires more than just stacking data volume. Ensuring high-dimensionality and quality in mass collection, establishing unified data standards, and properly handling privacy and compliance issues in large-scale operations are critical challenges that companies and the industry must resolve on the path to commercialization.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment