Try FLOAT to generate your AI Talking Avatar


We have already experienced many face talking avatars, but this is more than the earlier ones. Float (Flow Matching for Audio-driven Talking Portrait) by DeepBrainAI, can generate talking avatar video. It takes an audio and a portrait image as its input and generates a talking video.


This analyzes the audio pitch frequency and adds emotions to your generated output that look more promising, expressive, and realistic. When you are animating a talking portrait, you do not really need to regenerate every pixel from scratch. 

What you need is consistent, believable motion that can be applied to your source image. By learning a compact representation of motion patterns, FLOAT can generate temporally consistent animations much more efficiently.

It delivers faster generation than other diffusion-based models with fewer sampling steps and lower memory. You can find more in-depth information in their research paper. The model and script are registered under non non-commercial license.


Installation

1. Install ComfyUI and get the basic understanding from our ComfyUI beginners' guide.

2. Move into the "ComfyUI/custom_nodes" folder. Clone the repository using the command prompt using the following command:

git clone https://github.com/deepbrainai-research/float.git

3. Install the required dependencies using the command provided below.

For normal comfyui users:

pip install -r requirements.txt

For ComfyUI portable users:

cd ./ComfyUI-FLOAT

pip install -r requirements.txt

4. The Float model gets auto-downloaded from Hugging Face repository when you run the workflow for the first time. It gets saved into your "ComfyUI/models/float" directory. You can track its real-time into your Comfyui terminal.

5. Restart your ComfyUI and refresh it.


Workflow

1. Get the workflow inside the "ComfyUI/custom_nodes/ComfyUI-FLOAT" folder.

2. Drag and drop into Comfyui.

3. Set up the workflow:





(a) Load the target image and reference audio.



(b) Load Float Model




(c) Set the configuration:

FPS: 25 (default)

Emotion: none, angry, disgust, fear, happy, neutral, sad, surprise.

Seed: random, fixed, increment 

4. Hit the queue button to initiate the generation process.