Forget Elevenlabs Try Chatterbox Free Text to Speech

Install chatterbox open-source tts

When it comes to video creation, gaming, memes, or AI agents, there have not been much free options for text-to-speech models and most are paid and enjoy a near monopoly. But now, you can try Chatterbox, a free and open-source text-to-speech model developed by Resemble AI and licensed under MIT.

Chatterbox can generate speech from text and features emotion exaggeration control. It also allows you to clone your voice using just a 5-second audio sample. Currently, it only supports English, but support for other languages is likely in the near future.

They claim that Chatterbox outperforms premium closed-source TTS models like ElevenLabs. You can explore their research and performance benchmarks in their analysis project.

We also have Bark from SunoAI that supports multiple languages and emotions, but it does not auto-detect emotional tone and you have to specify the emotion style manually. Unlike Chatterbox, it is under a non-commercial license and due to the limitations, you can generate only up to 14 seconds of audio at a time.


Installation

1. Install ComfyUI if you are new user. Older user need to update it from the Manager tab.

2. Move inside your ComfyUI/custom_nodes folder. Clone the repository using following command:

git clone https://github.com/filliptm/ComfyUI_Fill-ChatterBox.git

3. Install the required dependencies using command prompt.

For normal ComfyUI users:

pip install -r requirements.txt

For ComfyUI portable user, move inside ComfyUI_windows_portable folder and use this command:

python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI_Fill-ChatterBox\requirements.txt

4. The Chatterbox TTS Model will be automatically downloaded from the ResembleAI's Hugging Face repository when you run the workflow for the first time. So, downloading manually is not required. You can track the real-time downloading status from ComfyUI's terminal.

4. Restart and refresh ComfyUI to take effect.


Workflow

1. After installation, get the workflow from ComfyUI/custom_nodes/ComfyUI_Fill-ChatterBox/web folder.

2. Drag and Drop into ComfyUI. You can also create these workflows by searching "FL Chatterbox" node. Here, you will get two workflows:


Text to Speech

(a) Text to Speech workflow

LoadAudio : Uploads or loads an existing audio file.

FL chatterBox TTS node : 

exaggeration: 0.50 (Controls how expressive the voice is. Higher means more dramatic.)

cfg_weight: 0.50 (Balances between creativity and prompt adherence. )

temperature: 0.80 (This affects randomness; higher = more diverse/creative outputs.)

use_cpu: false  (The model runs on GPU for faster inference). 

keep_model_loaded: true (Keeps the model in memory to reduce reloading times.)


Voice Cloning

(b) Voice Cloning workflow

LoadAudio node : Upload an existing audio file. 
The first one uses the audio that needs to be transformed into the voice of actual speaker.
The second one is the actual target speaker's voice that we want the input audio to sound like (i.e., the clone target).

FL chatterbox VC node: 

use_cpu: false  (Utilizes GPU for better performance.) 
keep_model_loaded: true (Reduces reload times by caching the model.)