On December 3, Kling launched its video generation 2.6 model, featuring a groundbreaking "audio-visual sync" capability. This innovation transforms the traditional AI video generation workflow, which previously required silent visuals followed by manual dubbing. The new model can produce complete videos in a single generation, incorporating natural speech, action sound effects, and ambient audio, significantly enhancing creative efficiency.
The upgraded model introduces two major functions: text-to-audio-visual and image-to-audio-visual generation. Currently, it supports Chinese and English voice generation, with video lengths of up to 10 seconds. By deeply aligning semantic understanding of real-world sounds and dynamic visuals, the Kling 2.6 model excels in audio-visual synchronization, audio quality, and semantic comprehension. Notably, it maintains a globally leading position in Chinese voice generation quality.
Comments