Home » Latest News » AI Video Innovation: Hands-Free Card Drawing Revolution
Cover image of AI video technology.

AI Video Innovation: Hands-Free Card Drawing Revolution

Not satisfied after watching “Squid Game”? Create your own ending.

Image from Squid Game
GIF of a custom Squid Game ending

Can’t wait for “Dune Part Three”? Make your own version.

Image from Dune
GIF of a custom Dune scene

Previously, maintaining consistent character appearances required significant time. Now, with just a screenshot, AI can start making movies.

This is thanks to Conch AI’s “Subject Reference” feature, powered by the new S2V-01 model. It accurately identifies the subject in uploaded images and sets it as the character in generated videos. The rest is simple: create freely with basic instructions.

GIF showing precise facial information retention
Creation by X user @KarolineGeorges, with precise facial information retention”
GIF showing diverse subjects
Creation by X user @Apple_Dog_Sol, showcasing diverse subjects”

Advantages of the “Subject Reference” Feature

Many companies are developing “Subject Reference” features, but not all can tackle the challenges of stability and coherence, especially maintaining consistency in motion.

While others may struggle, Conch AI excels. With just one image, it accurately understands character traits, identifies them as subjects, and places them in various scenes.

One moment Spider-Man is saving the world, the next he’s riding a motorcycle.

Spider on a web

Spider moving on a web

The Mother of Dragons, who should be training dragons in “Game of Thrones,” is now playing with a little wolf.

Mother of Dragons with a wolf
Mother of Dragons playing with a wolf

The breakthrough in “subject reference” lies in achieving a balance between creative freedom and fidelity. It’s like giving creators a “universal actor” whose appearance doesn’t distort but naturally changes with actions and poses, performing any action in any scene as required by the director.

Not Just a New Feature, But a Unique Technical Solution

The actual test experience shows that subject reference is a different function, with different technical challenges and requirements compared to text-to-image or image-to-image generation.

Traditional image-to-video generation only animates static images, mainly with partial modifications. For example, in this still of Song Hye-kyo, image-to-video only turns the static image into a dynamic one with limited range and no significant movements.

Original still of Song Hye-kyo
Original still
Animated still of Song Hye-kyo
Video generated from image-to-video

With the same photo, “subject reference” can create a complete segment based on text prompts, allowing free movement while maintaining stable facial features.

Generated video of Song Hye-kyo
Warm indoor lighting, in a theater audience, the protagonist in a black suit, sitting mid-row left. Her expression is focused, occasionally smiling lightly, clapping naturally and rhythmically. The camera starts from her side, capturing silhouettes of other audience members and the dim seat textures, emphasizing the environment’s depth. As the camera moves in, the protagonist stands up.

There are currently two technical routes for generating videos with a subject. One is based on LoRA technology, which fine-tunes pre-trained large generative models. LoRA requires significant computation when generating new videos, necessitating users to upload multiple angles of the same subject, even specifying different elements for each segment to ensure quality. This also consumes many tokens and requires a long wait time.

After extensive technical exploration, MiniMax chose a route based on image reference: images contain the most accurate visual information, aligning with the creative logic of physical shooting. In this route, the protagonist in the image is the model’s top priority for recognition—regardless of the subsequent scenes or plot, the subject must remain consistent.

Other visual information is more open and controlled by text prompts. This approach achieves the goal of “precise reproduction + high freedom.”

Mother of Dragons with a dragon
Character stands before a dragon, hair and dress blowing in the wind.
In a clearing in the valley, the protagonist stands before a dragon, their long hair flowing in the wind. The camera gradually zooms out, capturing the protagonist turning to look into the distance. The dragon’s wings spread, blowing the protagonist’s hair and dress, and the scene ends with an overhead shot.”

In this video, only one picture of the Dragon Queen was provided to the model. The final generated video accurately presented the camera language and visual elements mentioned in the prompt, demonstrating a strong understanding.

Compared to the LoRA solution, this technical approach significantly reduces the amount of material users need to upload, transforming dozens of video segments into a single image. The waiting time is measured in seconds, feeling similar to the time it takes to generate text or images—combining the accuracy of image-to-video with the freedom of text-to-video.

Highlights of Chinese Manufacturing, Meeting Your Multiple Needs

Multiple needs are not an excessive demand. Only by simultaneously achieving accurate and consistent character images and free movement can the model surpass simple entertainment uses and have broader value in industry applications.

For example, in product advertisements, a single model image can directly generate various product videos by simply changing the prompt words.

Runner in motion, showcasing dynamic video generation.
Glass product video, highlighting detailed visual generation.

If using image-to-video methods, the current mainstream solution is to set the first and last frames, with the effect limited by the existing images. It also requires repeated attempts to collect different angles and then stitch the materials together to complete a sequence of shots.

Combining the characteristics of different technologies to better fit the video creation workflow is the advantage of “subject reference.” In the future, over 80% of marketing professionals will use generative tools at various stages, focusing only on story and plot conception, freeing their hands.

According to Statista, the market size of generative AI products in advertising and marketing exceeded $15 billion in 2021. By 2028, this number will reach $107.5 billion. In previous workflows, pure text-to-video had too many uncontrollable factors, suitable for the early stages of creation. In the European and American advertising and marketing industries, generative AI is already very common, with 52% of use cases for drafts and planning, and 48% for brainstorming.

Currently, Hailuo AI first opens the reference capability for a single character. In the future, it will expand to multiple characters, objects, scenes, and more, further unleashing creativity, as proposed by Hailuo’s slogan, “Every idea is a blockbuster.”

Since MiniMax released the video model in August 2023, it has continuously attracted a large number of users internationally, from the quality and smoothness of the generated images to consistency and stability, receiving a lot of positive feedback and professional recognition.

Hailuo AI logo
Techhalla logo.

In the past year of technological competition, the competitive landscape of the AI video generation field has initially emerged. Sora’s implementation showed the potential of video generation, prompting major tech companies to invest heavily in this field.

With the delayed launch of Sora’s product at the end of 2024 and average user reviews, it failed to meet market expectations, giving other players a chance to seize the market.

Now, as generative video enters the second half, only three companies truly demonstrate technical strength and development potential: MiniMax’s Hailuo AI, Kuaishou’s Keling AI, and ByteDance’s Jimeng AI.

As a startup founded just three years ago, MiniMax has brought products and technology that can compete at the top level with its lean startup size. From the I2V-01-Live image-to-video model in December 2023 to the new S2V-01 model, they have been solving the challenges of previous video generation.

As technology continues to mature and application scenarios expand, video generation AI will spark a new revolution in content creation, film production, marketing, and communication. These companies, representing the highest level of China’s video generation AI field, are not only leading the Chinese market but are also expected to compete globally with international giants. Meanwhile, ensuring product stability and controllability while maintaining technological innovation will be a continuous challenge for these enterprises.

Source from ifanr

Disclaimer: The information set forth above is provided by ifanr.com, independently of Chovm.com. Chovm.com makes no representation and warranties as to the quality and reliability of the seller and products. Chovm.com expressly disclaims any liability for breaches pertaining to the copyright of content.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top