Doubao AI Model 2.1 Launches with Over 10x Token Usage Growth; Seedance 2.5 Set for Early July Release

Deep News06-23

Volcano Engine has launched a trio of models, signaling a full-scale push into the production-grade AI market with a rapid product cadence and aggressive pricing strategy.

On Tuesday, Volcano Engine officially released the Doubao large model 2.1 series, including the flagship Doubao-Seed-2.1-Pro and the lightweight Doubao-Seed-2.1-Turbo, with their APIs now fully available on Volcano Ark. Concurrently, the video generation model Seedance 2.5 was announced for an official release in early July, and the audio generation model 1.0 has begun invite-only testing. This marks the Doubao ecosystem's comprehensive expansion from language understanding to multimodal content creation.

The Doubao large model 2.1 Pro is priced at 6 yuan per million tokens for input and 30 yuan for output. In Coding and Agent scenarios, the comprehensive cost is reduced to just 1.96 yuan per million tokens, directly targeting enterprise-grade production environments. Volcano Engine also introduced the continuously iterating version Doubao-Seed-Evolving, which will receive rolling updates 2 to 4 times per month, allowing enterprises to access the latest model capabilities without changing their API endpoints.

During the event, Tan Dai, President of Volcano Engine, disclosed the latest data: as of June this year, the Doubao large model's daily token call volume has exceeded 180 trillion, representing growth of over 10 times compared to last year. Meanwhile, in China's public cloud MaaS service market, Volcano Engine ranks first with a 49.5% market share.

This product portfolio will directly impact the domestic enterprise AI procurement landscape. The Doubao large model 2.1 has been integrated with partners including WPS, Dedao, and Unity (Tuanjie Engine), with plans to reach hundreds of millions of Doubao users. In multiple recognized benchmark tests, the performance of Doubao large model 2.1 Pro in Coding and Agent tasks has approached or even surpassed top international models such as OpenAI's GPT-5.5 and Anthropic's Claude Opus 4.7.

Coding Capabilities Surpass Production Threshold

The Doubao large model 2.1 Pro has demonstrated capabilities on par with international flagship models across several industry-recognized programming benchmarks. On the Terminal Bench evaluation, it performs roughly on par with Claude Opus 4.7, capable of end-to-end completion of full engineering tasks in a command-line environment. On the long-range software development benchmark SWE-Pro, its performance is close to that of GPT-5.5.

In the NL2Repo-Bench evaluation for natural language to repository-level code conversion, the Doubao large model 2.1 Pro surpasses GPT-5.5. On the scientific computing code benchmark SciCode, Doubao 2.1 Pro scored 59.8, exceeding both Claude Opus 4.7 and GPT-5.5. This test covers real-world research problems across five major disciplines—mathematics, physics, chemistry, biology, and materials science—and is one of the most rigorous benchmarks in the AI for Science field.

In developer crowd-testing, over 60% of developers rated the output quality of the Doubao large model 2.1 Pro in real coding tasks as higher than that of Claude Opus 4.6. Volcano Engine also disclosed an RTL chip design case: Doubao 2.1 Pro ran continuously for nearly 18 hours, underwent 9 iterations, generated RTL code for 6 core modules comprising 1,303 lines, passed the complete engineering process including simulation, testing, and synthesis checks, and finally passed handwritten digit recognition verification, achieving production-grade coding delivery.

Agent Capabilities Leap, Cover High-Economic-Value Tasks

In terms of general Agent capability, the Doubao large model 2.1 Pro achieved the highest score on the GDPval benchmark released by OpenAI, a test set covering real-world, high-economic-value tasks across 9 major industries and 44 professions. On the recently released Agents' Last Exam (ALE) benchmark in June 2026, Doubao 2.1 Pro outperformed Claude Opus 4.7. This benchmark covers over 1,000 high-economic-value real-world tasks across 13 industry clusters. Being newly released and difficult to specifically optimize for, it provides a more authentic measure of a model's generalization ability in novel scenarios.

Regarding tool calling, the Doubao large model 2.1 Pro comprehensively outperformed Claude Opus 4.7 and GPT-5.5 on the MCP-Atlas evaluation set, demonstrating greater stability when using real MCP Servers and multiple tool types. Volcano Engine showcased a typical application case: a developer used this model to orchestrate over 500 Agents working in coordination, cumulatively triggering tool calls thousands of times, ultimately completing the 3D construction of a city with over 100 uniquely styled buildings on a single large map.

Multimodal Understanding Maintains Global Leadership

In image understanding, the Doubao large model 2.1 comprehensively outperformed GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on multiple leaderboards including MMMU-Pro, achieving global state-of-the-art (SOTA) status. In video temporal understanding, Doubao 2.1 Pro significantly led Gemini 3.1 Pro on two industry-authoritative benchmarks, TOMATO and LVBench.

In GUI Agent capabilities, the desktop performance of Doubao large model 2.1 Pro is close to that of Claude Opus 4.7, while its mobile performance significantly leads and comprehensively surpasses GPT-5.5, achieving global SOTA. Volcano Engine demonstrated an end-to-end video editing case: Doubao 2.1 Pro processed a video over two hours long in a single pass, automatically generating colloquial commentary, precisely locating segments, synthesizing audio, and outputting subtitles, all without manual intervention.

Seedance 2.5 and Audio Model Expand the Portfolio

It is understood that the Doubao video generation model Seedance 2.5 is currently in the final stages of internal testing and is expected to be officially released in early July. The new model supports single-video generation of up to 30 seconds with significantly improved shot coherence. It also supports joint input of up to 50 full-modal materials, claimed by the company to be the highest globally. Additionally, it features more flexible and controllable video editing capabilities, aimed at further enhancing creator efficiency and output quality.

On the same day, Volcano Engine officially released the Doubao audio generation model 1.0 (Doubao-Seed-Audio 1.0). It supports multimodal inputs like text and reference audio, enabling end-to-end generation of complete audio works containing multi-character dialogue, background music, and environmental sound effects, eliminating traditional post-production steps like multi-track editing, alignment, and mixing. The model supports single-session audio creation of up to 2 minutes, and can extend audio while maintaining timbre consistency through reference input. Its API is now available for invite-only testing on Volcano Ark, with plans for integration into products like Jianying, Jimeng, and Fanqie.

Pricing Strategy and Scaled Commercial Deployment

The pricing for Doubao large model 2.1 is designed to balance flagship performance with the needs of scaled deployment. The Pro version costs 6 yuan per million tokens for input and 30 yuan for output, with input dropping to just 1.2 yuan under cache hit conditions. The Turbo version offers capabilities similar to the Pro version at half the price, making it more suitable for high-frequency calling scenarios. When calculated comprehensively for Coding and Agent scenarios, the actual cost of the Pro version is compressed to just 1.96 yuan per million tokens.

Regarding product integration, the Doubao large model 2.1 is fully compatible with mainstream Harness frameworks like Claude Code and Codex, and development tools such as TRAE, TRAE WORK, and Kouzi are already available. Partners have provided feedback: WPS noted the model has established a stable and usable pipeline for core office tasks like PPT generation and spreadsheet delivery; Dedao reported zero violations in business rule adherence and core prohibition execution; Unity (Tuanjie Engine) observed that the model's single-task capability ceiling for script logic tasks is higher than that of top-tier models. Volcano Engine stated that Doubao products will soon integrate the Doubao large model 2.1 Pro to serve the office and productivity scenarios of hundreds of millions of users.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment