Wan2.1 Infinite Talk - Longer AI Talking Video Generation

 

infinte talk with wan2.1 for longer video generation

Dubbing videos has always been tricky. The usual methods manage to sync the lips with the audio, but the rest of the body often tells a different story. People notice when the lips move in sync but the head, facial expressions, or body movements remain stiff. Infinite Talk provides you stable and consistent results, even for unlimited video durations, without the awkward distortions that distract viewers.

Traditional dubbing models such as MultiTalk were built with lip synchronization as the main focus. This was useful but not complete, since it ignored other non verbal cues that bring authenticity to a video.


infinite talk architecture
Ref-Infinite Talk official page

Viewers pay attention not only to what is being said but how the speaker moves while saying it. Sparse frame video dubbing changes the game by synchronizing audio with a fuller range of body language. By generating from fewer frames and stretching this across long timelines, InfiniteTalk manages to maintain consistency while reducing distortions. To get the detailed information you can refer to their research paper.


Installation

1. Setup ComfyUI if you are new user. Update it if already running from the Manager by selecting Update All.

2. Install ComfyUI Wan Video Wrapper by Kijai from Manager by selecting Custom Nodes Manager.  You also need to download the Wan 2.1 Image to Video model.

You can follow Kijai Wan2.1 Video wrapper installation tutorial if you do not know how to do it. If already installed, just update it from the Manager

 download Infinite Talk models

3. Now, download Infinite Talk models from MeiGen-AI's hugging face repository. There are two variants available-
(a) Infinite Talk for single person (infinitetalk_single.safetensors)

(b) Infinite Talk for multiple person (infinitetalk_multi.safetensors)

Choose any of them as your requirements. Save it into ComfyUI/models/diffusion_models folder.

4. Restart ComfyUI and refresh it. 

Workflow

1. After installing Kijai's Wan Video Wrapper, you will get the workflows inside your ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/example_workflows folder.


(a) wanvideo_I2V_InfiniteTalk_example_03.json (Image to Video workflow)

(b) wanvideo_InfiniteTalk_V2V_example_02.json (Video to Video workflow)


2. Drag and drop workflow into ComfyUI.

3. Follow the workflow setup-

(a) Upload your image into Load Image node.

(b) Load Wan2.1 Image to Video (FP16/FP8/GGUF) model into Wan Video Model Loader node.

(c) Load Infinite talk model (Single/Multi) into Multi/Infinite Talk Model Loader.

(d) Load text encoders, VAE models into their respective nodes.

(e) Load your audio

 (f) Click Run button.



This is the improved version of older multi talk avatar model. The body movement waving hands, blinking eyes, breathing, lip syncing add more realism to the generated video. 

Well, there are some defects like wearied face expression, more eyes flickering, body motion but its bearable and can be controlled with double the FPS frame interpolation.