Try FLOAT to generate your AI Talking Avatar

generate realistic talking avatar with float

We have already experienced many face talking avatars, but this is more than the earlier ones. Float (Flow Matching for Audio-driven Talking Portrait) by DeepBrainAI, can generate talking avatar video. It takes an audio and a portrait image as its input and generates a talking video.

float process explanation
Reference- Float official page

This analyzes the audio pitch frequency and adds emotions to your generated output that look more promising, expressive, and realistic. When you are animating a talking portrait, you do not really need to regenerate every pixel from scratch. 

What you need is consistent, believable motion that can be applied to your source image. By learning a compact representation of motion patterns, FLOAT can generate temporally consistent animations much more efficiently.

It delivers faster generation than other diffusion-based models with fewer sampling steps and lower memory. You can find more in-depth information in their research paper. The model and script are registered under non non-commercial license.


Installation

1. Install ComfyUI and get the basic understanding from our ComfyUI beginners' guide.

2. Move into the "ComfyUI/custom_nodes" folder. Clone the repository using the command prompt using the following command:

git clone https://github.com/deepbrainai-research/float.git

3. Install the required dependencies using the command provided below.

For normal comfyui users:

pip install -r requirements.txt

For ComfyUI portable users:

cd ./ComfyUI-FLOAT

pip install -r requirements.txt

4. The Float model gets auto-downloaded from Hugging Face repository when you run the workflow for the first time. It gets saved into your "ComfyUI/models/float" directory. You can track its real-time into your Comfyui terminal.

5. Restart your ComfyUI and refresh it.


Workflow

1. Get the workflow inside the "ComfyUI/custom_nodes/ComfyUI-FLOAT" folder.

2. Drag and drop into Comfyui.

3. Set up the workflow:



load audio into node

(a) Load the target image and reference audio.

load float model

(b) Load Float Model

Set configuration


(c) Set the configuration:

FPS: 25 (default)

Emotion: none, angry, disgust, fear, happy, neutral, sad, surprise.

Seed: random, fixed, increment 

4. Hit the queue button to initiate the generation process.