On February 24, Beijing time, three leading Chinese AI companies—DeepSeek, Moonshot AI, and MiniMax—were publicly criticized by Anthropic, the parent company of the top-tier AI programming model Claude. Anthropic alleged that these Chinese firms used 24,000 "sock puppet accounts" to conduct 16 million conversations with the aim of extracting Claude's capabilities to train their own models. The company coined a term, calling this an "industrial-scale distillation attack."
However, the accusation was swiftly met with sharp criticism from Elon Musk, who stated, "How dare they steal what Anthropic stole from human programmers," adding, "It is an indisputable fact that Anthropic engaged in large-scale theft of training data and paid billions in settlements as a result."
On the other hand, the capabilities of domestic Chinese models have been steadily improving, with many developers now targeting Claude's most touted strength: programming prowess. Concurrently with the accusations, metrics such as revenue and API call volume for domestic models like MiniMax and Kimi have reached new highs. Chinese AI companies are demonstrating through tangible results that technological blockades and unsubstantiated allegations cannot hinder the progress of homegrown AI.
Is Distillation Now an "Attack Method"? The technique of distillation is inherently neutral; the issue lies in who uses it and how. Model distillation is a standard training technique in the AI field, facilitating knowledge transfer and model compression by having a larger model guide a smaller one. For instance, DeepSeek compressed a 175-billion-parameter model down to 7 billion parameters for financial applications, reducing inference costs by 98% while maintaining over 95% of core performance metrics. MiniMax's M2.5 model achieved a score of 80.2% on the SWE-bench Verified benchmark, nearly matching Claude Opus 4.6's 80.8%, but at merely 1/20th of the cost.
Currently, distillation is widely employed by AI firms globally, including giants like OpenAI, Google, and Meta. In fact, even Anthropic conceded after its accusations that "distillation is legitimate: AI labs use it to create smaller, more affordable models for clients." However, it immediately followed this by claiming that "after some foreign labs illegally distill US models, they can remove their safety guardrails and apply the model's capabilities to their own country's military, intelligence, and surveillance systems."
Anthropic's logic appears to be that distillation itself is acceptable, but when utilized by Chinese companies, it becomes "illegal theft." The company claims to have traced the "attacks" back to specific labs using information like IP addresses and request metadata, even suggesting links to Chinese firms based on "matches with employees' public profiles." However, these allegations currently remain unsubstantiated claims from Anthropic alone.
Some argue that such traceability methods lack legal standing, as distillation transfers functional logic rather than directly copying data, aligning more closely with the legally permissible realm of "reverse engineering." Simply labeling it "theft" cannot掩盖 the weakness of the evidence. This accusation also evokes a sense of "the thief crying 'stop thief'." Anthropic itself has a history related to "data theft": In September 2025, Anthropic was forced to pay a $1.5 billion settlement to a global collective of authors, led by writer Andrea Bartz, after it was found to have illegally downloaded over 7 million copyrighted books from pirate sites like LibGen and PiLiMi on a massive scale to train its AI models. As Musk stated, "This is an indisputable fact."
The AI industry is currently in a period with few established rules. Defining the boundaries of distillation technology and the底线 of data usage requires globally coordinated standards. While commercial competition is expected, resorting to labeling and double standards only hinders technological innovation and accessibility. When OpenAI, Google, and Anthropic themselves have extensively used unlicensed data for model training, their accusations regarding "distillation" seem more like defensive reactions aimed at protecting vested interests.
Undeterred by Accusations, Chinese AI Models Advance Rapidly In response to Anthropic's one-sided allegations, Moonshot AI and MiniMax have not issued formal responses, seemingly preferring to let data and results speak for themselves.
Since 2026, DeepSeek has published multiple papers and consistently open-sourced its latest research findings, firmly adhering to its philosophy of "reducing costs and increasing efficiency." Simultaneously, DeepSeek is conducting灰度 testing on its new model, with the V4 version imminent.
Moonshot AI revealed to reporters that just over a month after completing a previous $500 million funding round, it is set to finalize a new round exceeding $700 million, again oversubscribed. This round is jointly led by Alibaba, Tencent, Morningside Venture Capital, and JiuAn Capital. A subsequent funding round targeting a valuation between $10 billion and $12 billion has already commenced and has received indications of interest from multiple institutions. These consecutive rounds, totaling over $1.2 billion, set the highest fundraising record in the large language model industry in the past year.
Previously, ByteDance took over four years to surpass a $10 billion valuation, and Pinduoduo took over three years. Kimi achieved a more than 30-fold increase in valuation in just over two years. Based on this trajectory, Kimi is poised to set the record for the fastest growth from founding to a valuation exceeding $10 billion among Chinese companies.
Regarding model usage, Moonshot AI's Kimi K2.5 large model, released less than a month ago, has already generated cumulative revenue in its first 20 days that surpasses its total revenue for the entire year of 2025. According to OpenRouter, Kimi K2.5 ranks first in the model call rankings on the OpenClaw platform.
MiniMax disclosed that just before the Spring Festival, it open-sourced its new-generation M2.5 model. Within 12 hours of release, it topped OpenRouter's popularity chart, and within a week, it reached the number one spot in call volume, with weekly calls surging to 3.07 trillion tokens. OpenRouter's overall call volume also climbed simultaneously. Officials later confirmed that M2.5 drove incremental demand for calls in the 100K to 1M long-text range, which is a typical consumption scenario for Agent workflows.
In fact, the rapid progress of Chinese AI companies stems from a vast pool of engineering talent, abundant data resources, a well-developed industrial chain, and a commitment to open-source principles. Breakthroughs by DeepSeek, Kimi, and MiniMax in areas like programming, multimodality, and Agents are the result of technological innovation and deep integration with practical scenarios.
Comments