Six Releases in One Week! Kunlun Tech Co.,Ltd. Elevates Multimodal AI to New Heights

Deep News2025-08-17

Incredible momentum! Six models released in one week. Kunlun Tech Co.,Ltd. is operating at full capacity, pushing multimodal AI to unprecedented heights.

From August 11-15, the company rolled out new models daily, covering hot areas including video generation, world models, unified multimodal systems, intelligent agents, and AI music creation - virtually every core scenario in multimodal AI applications.

Most remarkably, the majority of these models have been open-sourced by Kunlun Tech Co.,Ltd.!

Just before this technology week commenced, Kunlun Tech Co.,Ltd. successfully entered China's "AI Open Source Top 16," joining the ranks alongside major internet companies like Tencent and Alibaba.

The timing of this technology week appears particularly intriguing - while appearing as a high-profile technical showcase on the surface, it actually reveals Kunlun Tech Co.,Ltd.'s comprehensive AI strategy.

Let's examine what was released over the past week (in chronological order):

**SkyReels-A3: Making Live-streaming Commerce Easy with Just One Image**

Kunlun Tech Co.,Ltd. launched with the SkyReels-A3 model, specifically targeting digital human live-streaming commerce (considering China's live-streaming market alone approaches nearly 10 trillion yuan).

The model offers three main approaches with official demos showing it's becoming increasingly difficult to distinguish between real human appearances and digital humans in daily videos - the hand movements, speech tone, rhythm, and lip-sync are all remarkably natural.

Beyond powerful commerce capabilities, this model intentionally incorporates "camera language" - eight preset common camera movements including fixed shots, push-in, pull-out, left pan, right pan, tilt up, tilt down, and handheld shots.

This enables smooth handling of scenarios requiring higher artistic aesthetics (such as music MVs, film clips, or presentation videos), unlike traditional digital humans limited to "fixed shots" that appear somewhat rigid.

Official evaluations show that across different audio-driven scenarios, SkyReels-A3 outperforms mainstream open-source models like OmniAvatar and closed-source models like OmniHuman across most metrics, particularly excelling in lip synchronization (Sync-C and Sync-D).

The core technology behind SkyReels-A3 includes DiT video diffusion models, which use Transformer architecture instead of traditional U-Net to better capture long-range dependencies.

Compared to previous versions SkyReels-V1 (released February 2024) and SkyReels-V2 (released April 2024), SkyReels-A3 brings four new user experiences: 1. Text Prompt input supporting scene changes 2. More natural action interactions, including product interactions and gestures during speech 3. Advanced camera movement application and control for higher artistic aesthetics in music/MV scenarios 4. Single-shot minute-level video generation supporting up to 60-second output, with multi-shot supporting unlimited duration

**Domestic Open-Source Genie 3: Matrix Meets Reality**

On the second day, they released Matrix-Game 2.0, an upgraded version of their self-developed world model Matrix series' interactive world model.

While Google DeepMind's Genie 3 garnered attention over a week ago, it wasn't open-sourced. Kunlun Tech Co.,Ltd. has achieved open-source availability with their Matrix-Game-Turbo being China's first model benchmarking against Genie 3.

This 2.0 version shows qualitative improvements in real-time generation and long-sequence capabilities. Previous models typically generated only 10-20 seconds, while this version starts at minute-level with real-time forward, backward, left, and right interactions.

Matrix-Game 2.0 offers three core advantages over the previous version through optimizations in data and architecture:

First, addressing data bottlenecks in existing interactive world models, they built scalable data production pipelines based on Unreal Engine and GTA 5, producing approximately 1,350 hours of high-quality interactive video data.

Second, targeting real-time performance issues, they designed action-conditional control modules on a 1.3B small model foundation, supporting frame-level keyboard and mouse interaction inputs.

Third, addressing short generation sequences, they employed few-step autoregressive diffusion models for real-time long-sequence video generation, achieving 25 FPS generation speed on single GPUs.

Simultaneously, Kunlun Tech Co.,Ltd. released and open-sourced the 3D scene generation model Matrix-3D - a unified framework integrating panoramic video generation and 3D reconstruction that generates high-quality, trajectory-consistent panoramic videos from single images and directly restores explorable 3D spaces.

**New Framework Achieves SOTA in Image Generation/Editing**

On the third day, Kunlun Tech Co.,Ltd. focused on unified multimodal systems, officially open-sourcing the Skywork UniPic 2.0 model as an efficient training and inference framework for unified multimodal modeling, enabling one model to handle image understanding, generation, and editing.

Rather than following the industry's typical "more power, better results" approach of adding parameters, GPUs, and computing power, Kunlun Tech Co.,Ltd. demonstrated that optimized training strategies can replace pure model expansion, reducing training costs and hardware requirements for high-performance image generation/editing models.

Through improved SD3.5-Medium architecture and their proprietary "progressive dual-task reinforcement strategy," they achieved a 2B model that outperforms BAGEL (7B) and Flux-Kontext (12B) in image generation and editing performance.

When combining this 2B model with Qwen2.5-VL-7B through joint training, the resulting unified multimodal model UniPic2-Metaquery achieved new SOTA records across understanding, generation, and editing tasks.

**Skywork Deep Research Agent Core Engine Upgraded Again**

Kunlun Tech Co.,Ltd. officially released Skywork Deep Research Agent v2, the core engine of their super intelligent agent platform, producing high-density, high-quality documents, presentations, spreadsheets, and other deliverables for platform users.

This upgrade primarily focuses on multimodal capabilities with three key improvements: 1. Launched "Multimodal Deep Research" Agent, first integrating multimodal retrieval, understanding, and generation 2. Introduced "Multimodal Deep Browser Agent," revolutionizing social media content analysis and data insights 3. Enhanced deep information search and complex task execution capabilities, achieving SOTA across multiple evaluation benchmarks

**Music Model with Better Chinese Song Understanding**

On the final day, Kunlun Tech Co.,Ltd. concluded with a strong finish in music models, officially launching Mureka V7.5, elevating Chinese song interpretation to new levels.

Through deep understanding of Chinese music diversity and cultural characteristics, the model more precisely conveys Chinese music's artistic essence and emotions. Optimized ASR technology enhances vocal authenticity and emotional depth, making AI singing more natural, especially in Chinese song rhythm and breathing processing.

Direct comparison with international leading music generation model Suno v4.5 shows Mureka V7.5 demonstrating stronger rock characteristics and better prompt adherence.

Simultaneously, Kunlun Tech Co.,Ltd.'s voice team launched MoE-TTS, the first character description voice synthesis framework based on MoE (Mixture of Experts), enabling users to precisely control voice characteristics and styles through natural language descriptions.

**Strategic Analysis: The Comprehensive Chess Game**

Kunlun Tech Co.,Ltd.'s achievements stem from three core strategic aspects:

**Strategic Determination**: As early as 2023, when ChatGPT triggered the current AI wave, Kunlun Tech Co.,Ltd. established an "All in AGI & AIGC" strategy at the top design level. This forward-looking strategic decision demonstrated deep insights into AI's future development.

**Vertical Domain Focus**: Unlike industry pursuit of "super applications" and general agents, CEO Fang Han believes general agents are logically unsound, with deep optimization in vertical domains being the future. Only applications integrated into users' daily workflows with high-frequency usage can generate significant commercial value and user stickiness.

**Open Source Ecosystem**: Rather than choosing closed-source approaches like some peers, Kunlun Tech Co.,Ltd. consistently maintains open-source commitments at key junctures, continuously contributing high-quality models and tools. This helps establish technical discourse power while attracting more developers and partners, forming a positive cycle of "technology-community-application."

Kunlun Tech Co.,Ltd. has successfully entered "China's AI Open Source Top 16," with its ecosystem position steadily rising.

As a first-tier member of China's AI enterprises, Kunlun Tech Co.,Ltd. is accelerating its AI strategy implementation, demonstrating strong technical capabilities and commercial potential worthy of capital attention.

The conclusion of this technology week represents not an endpoint, but a new starting point for Kunlun Tech Co.,Ltd.'s AI journey.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Six Releases in One Week! Kunlun Tech Co.,Ltd. Elevates Multimodal AI to New Heights

Comments