Just Now: Domestic Video Model Tops Global Charts! Teaches Google Veo a Lesson and Turns a Profit

Reported by New Vision

Editor: Rhino KingHZ

[New Vision Summary] Looking back from the spring of 2026, in the wake of the Sora wave, SkyReels V4 has ascended to the top of the global rankings with its four-in-one top-tier capabilities (multimodal reference, audio-video joint generation, unified task framework, and full-modal reinforcement learning)! The era of unification for AI video creation has arrived, and it truly belongs to China!

For the first time, a domestic video generation model has reached the very pinnacle of the world.

Just moments ago, in the latest rankings from the third-party agency Artificial Analysis, SkyReels V4 claimed the number one spot globally for "Text-to-Video (with Audio)"!

It surpassed Google's Veo 3.1 and also exceeded Kling 3.0.

若影片無法播放，請改看來源頁。

Crucially, this ranking isn't based on vendor-run benchmarks. It relies on blind evaluation results from a large number of real users.

This signifies that in the most difficult and valuable track of "Text-to-Video + Audio," domestic models have surged to the forefront.

On February 27, when the SkyReels V4 Preview first debuted, it had already secured second place globally.

In less than a month, SkyReels V4 took another step forward, jumping directly to first place.

SkyReels V4 isn't just stronger; it has begun to rewrite the global ranking of video models.

It marks the moment when Chinese AIGC video technology officially leads the world.

At the 2026 Zhongguancun Forum, SkyReels-V4 is about to be officially released in a major launch, and the API is already open (skyreels.ai).

Link: https://www.skyreels.ai/api-platform

In other modalities, SkyReels V4 also performed excellently, ranking second in "Text-to-Video (without Audio)."

Rather than talking empty data, let's look directly at the capabilities. Next, let's take a good look at just how fierce the world's number one video AI really is.

Real-World Test: The King of AI Short Dramas

For Kunlun Wanwei's Tiangong AI, SkyReels is beginning its transition towards a complete multimodal video generation system, supporting text, image, video, and audio input.

It is the world's first video foundation model that simultaneously supports multimodal input, joint audio-video generation, and unified generation/restoration/editing tasks.

The following six directions of real-world testing will each let you feel the terrifying power of this model.

AI Short Drama Generation: Two Images + One Line of Dialogue = Direct Cinema-Grade Short Drama Output.

You just need to throw in two character images and write a line of dialogue.

SkyReels-V4 can directly spit out a 1080p, 32FPS, 15-second video.

The texture of the visuals, character expressions, and lip-syncing have almost no "AI flavor."

Whether it's Eastern or Western faces, the effect is extremely natural.

Amidst thunder and lightning, with wind and sand sweeping across the wasteland, Guan Yu and Qin Qiong engage in an epic showdown—

若影片無法播放，請改看來源頁。

From simple text to complete video + audio, even those with zero foundation can easily create cinema-grade content, truly achieving "shoot whatever you want to shoot!"

The key is that the "AI flavor" is almost gone.

若影片無法播放，請改看來源頁。

More critically, this time it's not "generate the picture first, then hard-paste the sound."

SkyReels-V4 is specifically designed to process visuals and sound simultaneously.

若影片無法播放，請改看來源頁。

Multi-frame Reference: Nine images finally lock down both the character and the plot.

One of the heaviest upgrades in this SkyReels-V4 launch is multi-frame reference.

You can give it up to 9 keyframes.

It will fill in the intermediate actions, camera shots, and transitions based on these 9 images.

This is very important and very practical.

In the past, making AI short dramas had two main failure points:

One second it's this face, the next second the face has "changed";
Just now it was in this scene, and suddenly it jumped to another world.

SkyReels-V4's most practical improvement this time is pressing down these two pitfalls, making it the undisputed King of AI Comic Dramas.

Prompt Example: "The bare-backed youth from @Image-1 keeps running forward, encountering several corners with the camera tracking; then the camera switches to @Image-2, the youth is bare-chested, continues running forward then makes a sharp turn; then switches to @Image-3, he reveals a surprised expression; finally switches to @Image-4, he twists the dial to the right, and a large cloud of thick smoke fills the screen."

This level of video control ability is simply amazing.

The style is also completely unified. For this kind of comic drama, there is not a trace of "AI flavor."

若影片無法播放，請改看來源頁。

Take for example this animation resembling the "No-Face" monster.

Based on the anime plot in @Image-1, naturally transition and expand from top to bottom, left to right, to generate an animated short film.

The fighting scenes are incredibly smooth, and the close-up shot transitions are very reasonable.

若影片無法播放，請改看來源頁。

This kind of fantasy-style animation is also no problem.

Thanks to SkyReels-V4's audio-visual synchronous generation capability, lip-syncing for speaking characters is no longer a challenge.

若影片無法播放，請改看來源頁。

All-in-One Video Editing: Edit Video with Your Voice.

What's even more fierce is that it doesn't just generate video; it directly edits video, acting as a post-production god-tool.

You can ask it to do three types of things:

First, add things to the scene.

Put a hat on a character, place flowers in a room, or insert a new character into the original scene.

"Add the blue ribbed knit beanie from @image_1 onto the head of the central dancer in @video_1."

With one sentence, the hat is added to the young lady's head.

Even more shockingly, it looks perfect from every angle.

It's stunning.

Second, change character actions.

Make the newly added character dance along with the original character, or re-bind the movements.

"Add the colorful fursuit character from @image_1 into the urban dance scene in @video_1, placing them on the dance floor next to the dancer. The character should mirror the dancer's movements with a playful, exaggerated dance style."

Not only was the character added, but even more impressively, they can dance in coordination with the original person.

This video generation understanding ability is amazing.

Third, perform direct cleanup.

Remove subtitles, remove watermarks, remove station logos, delete passersby, delete animals, delete any unwanted interference.

This editing capability, built upon the model's full understanding of the video, is simply too strong.

Work that previously required you to switch back and forth between Premiere, AE, and various AI tools can now be done entirely by the SkyReels-V4 model.

In other words, video generation, element implantation, character editing, and scene cleanup are being converged into a single universal editing framework.

A major breakthrough this time is unifying video generation, frame interpolation, extension, and editing into the same interface, letting text-to-video, image-to-video, video extension, start-end frame interpolation, and local/global editing all fall under one processing framework.

Hard Technical Breakthrough: How Does It Go Head-to-Head with Seedance 2.0?

After seeing the effects, let's look at where the technology behind SkyReels-V4 is truly hard.

Last month, when SkyReels V4 Preview landed at number 2 in the global active model rankings, we did a detailed analysis report. — After Seedance 2.0 shook the scene, another Chinese dark horse tops the AA list! The AI flavor is gone.

In less than a month, going from Preview version global second place to the upgraded version topping the first place — this speed is called "cheating" in games, but in the AI circle, it's called "SkyReels-V4."

SkyReels-V4's charge forward this time doesn't rely on minor tweaks.

It mainly cured two old ailments of video AI.

The first old ailment is "The visuals look good, but the logic doesn't hold."

For example, water flowing upwards, cups hanging in mid-air (and it's not anime); a person turns around and the movement glitches.

To solve this problem, SkyReels-V4, during training, no longer just focuses on "does it look like it," but also judges "is it correct."

To put it bluntly, a stricter scoring system was added to the model:

Visuals must look good, actions must be reasonable, and sound must match lip-sync and rhythm.
Wherever it's wrong, send it back to retrain repeatedly.

This process is called Full-Modal Reinforcement Learning in the paper.

On the other hand, the team introduced a step-wise curriculum reinforcement learning mechanism. Focusing on three key dimensions: resolution and duration, task complexity, and data difficulty, it pushes the model to advance gradually from simple tasks to complex tasks, continuously improving its control over high-difficulty generation scenarios.

You can understand it as: Previously, the teacher only looked at whether the test paper was pretty; now the teacher starts watching the logic, actions, and expression simultaneously.

Previously, the teacher only cared if the exam score was good; now the teacher starts paying attention to the student's learning process and improves teaching methods.

The second old ailment is "The character can't be remembered."

You give it a few keyframes, and SkyReels-V4 can fill in the intermediate process. You give it nine plot images, and SkyReels-V4 can try its best to lock in the character's face, clothing, and scene style throughout.

This is crucial for AI short dramas.

In the past, the most immersion-breaking thing was the character looking different every moment.

In the past, when AI made short dramas, the male lead had a pointed chin in episode one, and by episode two, he had a square face, causing the audience to lose immersion immediately.

Now with the nine-grid reference, the character remains consistent throughout the whole series, and the scene is coherent throughout. AI short dramas have finally upgraded from "watching for fun" to a level where you can "seriously follow."

These two capabilities have pulled video generation consistency and controllability up to the industry ceiling, evolving SkyReels-V4 from a "video generation tool" into an "industrial production engine for short dramas."

The SkyReels-V4 technical report has also been made public.

Technical Report: https://arxiv.org/pdf/2602.21818

Facing the Test of Practice

The Domestic AI Version of Netflix is Here

What is truly worth noting is not just the ranking, but that this model has already been put into business operations.

DramaWave: Kunlun Wanwei's AI version of Netflix.

SkyReels-V4 technology directly supports Kunlun Wanwei's short drama platform, DramaWave.

As of January 2026, the Kunlun short drama platform, centered on DramaWave and FreeReels, has broken through the 80 million MAU mark, with an Annualized Run Rate (ARR) revenue exceeding $480 million, and monthly revenue reaching $40 million.

These aren't numbers on a PPT; they are real users paying real money to watch AI-participated content.

Recently, DramaWave launched the "Million Dollar • Drama Starter AI" creator support plan, welcoming high-quality creators globally. Kunlun Wanwei's newly self-developed AI short drama agent tool, SkyAnime, also launched simultaneously, empowering creators from the tool side and comprehensively improving creation efficiency.

Nearly a thousand works have gone online in the AI drama module on DramaWave, with AI self-produced drama monthly capacity exceeding 30 titles.

Taking the self-produced AI short drama "Plundering Terms! I Transformed into a Necromancer Catastrophe" as an example, produced relying on the SkyAnime tool, the cost was less than $20,000. After launch, daily investment exceeded $100,000, and cumulative views reached several million.

This is a perfect closed-loop verification of "Technology → Product → Commercialization."

Upgrading from "Fragment Generation" to Industrial Full-Link Video Production.

The significance of SkyReels-V4 goes far beyond "being able to generate a good-looking video clip."

For the AI short drama industry, SkyReels-V4 solves the core pain point: Character Consistency.

In the past, AI-generated short dramas would have characters "change faces" when the camera angle changed, making it impossible for the audience to get into the story.

SkyReels-V4's nine-grid reference capability keeps the character consistent throughout the entire series, bringing the quality of AI short dramas to a level of "worth watching seriously" for the first time.

For the entire AI film and television industry, this is a qualitative leap.

Providing a Unified Video Generation Base for Games, Music, and Content Ecosystems.

It is worth noting that SkyReels-V4 is not an isolated product.

Kunlun Wanwei also owns the AI music creation platform Mureka — its O1 model is the world's first music reasoning large model introducing Chain-of-Thought (CoT) technology. The V8 version continues to break through in timbre, performance techniques, and emotional expression, with users in over 100 countries and regions globally.

SkyReels-V4's video capabilities + Mureka's music capabilities constitute a full-link creation closed loop from visuals to sound, from backing music to vocals.

A company possessing both top-tier global video large models and music large models is rare worldwide.

A brand can generate a complete video ad with one sentence; an independent musician can turn a song directly into a high-quality MV; an educational institution can automatically convert courses into teaching videos complete with narration, background music, and dynamic visuals — these are not fantasies, but things happening right now.

All in AGI

Looking back at the development trajectory of Kunlun Tiangong in the field of video large models, you will find that the rise of SkyReels-V4 is by no means accidental, but a strategic-level explosion of careful layout.

February 2025: Open-sourced SkyReels-V1 — China's first video generation model oriented towards AI short drama creation, trained on tens of millions of film and television data, supporting 33 micro-expressions and over 400 action combinations.
April 2025: Released SkyReels-V2 — The world's first infinite-duration movie generation model using the Diffusion Forcing framework.
January 2026: Open-sourced SkyReels-V3 — Supports 1-4 reference image inputs, achieving multi-subject video generation.
February 2026: SkyReels-V4 Preview released — Ranked second globally on Artificial Analysis.
March 2026: SkyReels-V4 officially tops the global chart.

From V1 to V4, it's not simply adding parameters. Each generation fixes a key shortcoming.

A major upgrade every 3-4 months on average; this iteration rhythm is almost unmatched in the global AI video field.

This rhythm of continuous innovation, coupled with Mureka's leading position in the AI music field, the Skywork series' breakthroughs in large language models and multimodal reasoning, and the commercial landing of the DramaWave short drama platform, means Kunlun Wanwei is building a complete AI ecosystem loop covering "Compute — Model — Application."

This is the most convincing demonstration of results since Kunlun Wanwei established the core strategy of "All in AGI and AIGC" in early 2023.

The "Unification" Moment of AI Video Creation

Looking back from the spring of 2026, the field of AI video generation has undergone earth-shaking changes in the past year.

From the first wave of trends set off by Sora, to the contention of hundreds of schools like Veo, Kling, and Seedance, to SkyReels-V4 topping the globe with its four-in-one capability of "Full-Modal Reference + Audio-Video Joint Generation + Unified Task Framework + Full-Modal Reinforcement Learning" — we are witnessing the opening of a new era.

In this era, video creation is no longer the exclusive privilege of professional teams, but a form of expression accessible to everyone with creativity.

And the technical direction represented by SkyReels-V4 — using one model and one operation to complete the entire process from text conception to audio-video finished product — is the clearest path to that future.

Kunlun Wanwei revealed three major future directions in the technical report: expanding video generation capabilities to longer durations (30s+), enhancing real-time interactive editing functions, and opening model API interfaces for integration with more creation tool ecosystems.

Each of these directions will further narrow the distance between AI video creation and professional film production.

The AI video race is far from over. But SkyReels-V4 has already proven one thing with its global number one result:

On this track, the voice from China's Kunlun Wanwei is not only worth listening to by the whole world — it has already stood at the peak of the world.

Just Now: Domestic Video Model Tops Global Charts! Teaches Google Veo a Lesson and Turns a Profit

Real-World Test: The King of AI Short Dramas

AI Short Drama Generation: Two Images + One Line of Dialogue = Direct Cinema-Grade Short Drama Output.

All-in-One Video Editing: Edit Video with Your Voice.

Hard Technical Breakthrough: How Does It Go Head-to-Head with Seedance 2.0?

Facing the Test of Practice

The Domestic AI Version of Netflix is Here

All in AGI

The "Unification" Moment of AI Video Creation

Related Articles

分享網址