The Rise of Kling: From Obscurity to a Leading Player in Global Video Generation

Stock News09:35

The AI-generated short film "Paper Phone," which has amassed over 100 million views online, tells a story of daring to break through. It features a young boy who wants to burn a "paper phone" for his deceased grandmother. This simple narrative allowed many viewers to see genuine human emotion in AI-generated content for the first time. Recently, Gai Kun, Senior Vice President of KUAISHOU-W (01024) and head of the Kling AI project, discussed in an interview the product's journey from an unknown entity to a top contender in the global video generation field. His account reveals an intriguing conclusion: in the era of large models, "daring to try" is often more critical than "being able to."

The first major decision was the leap from being a "nobody" to achieving a global first. In early 2024, the release of OpenAI's Sora demo sent shockwaves through the industry, but the product itself remained unavailable. Gai Kun made a decision that initially stunned his team: to develop the world's first user-accessible video generation model, aiming to surpass Sora's capabilities. At that time, Kling had minimal internal resources and even relied on "non-mainstream" computing power for training. However, Gai Kun predicted that after countering Google, OpenAI would shift its focus back to language models, creating a window of opportunity lasting five to six months. "If we didn't take this gamble, we risked falling into a vicious cycle of mediocrity, leading to a lack of resources, and ultimately, being淘汰," Gai Kun stated. "We were nobodies to start with. If we lost the bet, we'd still be nobodies. But if we won, it would completely change our destiny." He set a strict internal deadline: the entire pipeline from model to product had to be ready by May. On June 6, 2024, Kling 1.0 launched, becoming the world's first user-accessible DiT video generation model—while Sora wasn't officially released until the end of that year.

Following the initial success, a more difficult choice emerged. After version 2.0, the team faced two paths: continue enhancing clarity and stability, refining the existing technology, or pivot towards multimodality, treating images, videos, and even motion as AI's "language" to reinvent interaction. The first option offered high certainty; the second had almost no precedent. "Maintaining core metrics is essential, but solving new problems is the key to the future," Gai Kun said, ultimately choosing the latter path. The subsequent launch of the Motion Control feature validated this direction—users could upload a reference video to make a character replicate the actions. This feature gained rapid popularity in overseas markets with little promotional effort. Gai Kun later used an analogy: OpenAI is like an aloof goddess whose astonishing creations are admired from afar, but when Kling, the "neighbor," succeeded, it made everyone realize that this technology could actually be implemented.

These decisions reflect an evolving organizational methodology. Over the past two decades, the core competencies of internet companies have evolved twice: from being driven by product and operations to being driven by algorithms and A/B testing. However, in the era of large models, this logic is becoming obsolete, as a single experiment can cost tens of millions of dollars, making innovation a process of finding a path through the unknown. "When you can't compete on sheer muscle, you have to act like a 'mage,' using judgment to gain a local advantage," Gai Kun explained. In his view, companies now rely less on large-scale trial and error and more on a few critical decisions. From adopting the DiT architecture to pursuing multimodality and then an integrated model, Kling's key pivots were not discovered through testing but were driven by vision, placing significant bets on the right direction amidst vast possibilities.

Once the direction is set, execution becomes paramount. Gai Kun emphasizes the "Disagree and Commit" principle: major decisions allow for thorough debate and even opposition initially, but once a goal is set, everyone must commit 120% to its execution. "Many teams agree verbally but execution deviates. We place greater importance on unified action after the direction is set," he noted.

Looking at the end goal, Gai Kun believes that as generative capabilities and controllability continue to improve, AI video will evolve from a tool into an infrastructure. "When high-quality content becomes abundant and diverse enough, a new content platform will emerge," he said. In his view, the significance of the technology lies not just in efficiency gains but in unleashing expressive power—enabling more people to bring the stories in their minds to life. This has been Kling's vision from day one: "To let everyone become a director and allow everyone to film the good stories in their hearts." It may sound distant, but Gai Kun suggests that given the accelerated evolution of AI, this vision could become a reality within one to three years. A tangible signal of progress is commercial traction. In the fourth quarter of 2025, Kling AI's revenue reached 340 million yuan. In December of the same year, monthly revenue surpassed 20 million US dollars, corresponding to an annualized run rate of approximately 240 million US dollars.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment