LTX 2.3 Video+Audio Local Gen(GGUF/FP8/BF16)

 

LTX2.3 video with audio generation model released by Lightricks. Its a major upgrade to the LTX-2 model that improves both audio-visual quality and prompt adherence.  

LTX-2.3 is a DiT-based audio-video foundation model capable of generating synchronized video and audio within a single system. It combines the key components of modern video generation, offers open weights, and is designed to run efficiently on local machines for practical use. To get the in-depth insights you can access their research paper. 

They also released the desktop variant. LTX Desktop is an open-source desktop application registered under Apache2.0 designed to generate videos using LTX models directly on your computer. It supports local video generation on Windows systems with compatible NVIDIA GPUs, while also providing an API-based mode for unsupported hardware and macOS users. This allows creators to experiment with LTX models even if their system cannot run the models locally.  

The application brings multiple video generation capabilities into a single interface. Users can create videos from text prompts, images, or audio inputs, making it flexible for different creative workflows. It also includes a Retake-based video editing feature, which allows users to regenerate or modify parts of a video without starting from scratch. 

Along with this, LTX Desktop provides a built-in video editor and project-based workflow, helping users organize and manage their video editing projects more efficiently.  For local video generation on Windows, the system requires Windows 10 or 11 (x64) along with an NVIDIA GPU that supports CUDA and has at least 32GB of VRAM, although more VRAM is recommended for better performance. 

The system should also have at least 16GB of RAM, with 32GB recommended, and sufficient disk space to store the model weights and generated video files.  On macOS, LTX Desktop works through API mode only. It requires an Apple Silicon (arm64) device running macOS 13 Ventura or later, along with a stable internet connection to communicate with the remote API for video generation.

Installation

Update Comfyui


1.First, install ComfyUI if you are a new user. Older user need to update it from the Manager by selecting Update All.

2. From the Manager, select Install Custom Nodes option. Search for "LTXVideo" custom node and install it. If already installed then just update it from the Manager by clicking on Custom nodes manager option.

3. Now, there are different model variants-text to video, Image to video and Control to video. Download from LTX -2..3 Hugging face repository. To get the over all overview, you can also follow the LTX2 tutorial if you are new to it. 

download ltx2.3 models

Download the models(BF16/FP8/GGUF) as described below:-

Sl No Model Name Description
1 ltx-2.3-22b-dev.safetensors This is the full LTX-2.3 model with complete capabilities. It is designed for flexible usage and can also be trained or fine-tuned using bf16 precision, making it suitable for development and advanced experimentation.
2 ltx-2.3-22b-dev-fp8.safetensors This is LTX-2.3 FP8 model variant for low VRAM users. But, quality will be compromised.
3 ltx-2.3-22b-distilled-lora-384.safetensors This file contains a LoRA version of the distilled model. It is meant to be applied on top of the full LTX-2.3 model to enhance or modify its behavior without loading an entirely separate model.
4 ltx-2.3-22b-distilled.safetensors This is a distilled version of the full model that is optimized for faster generation. It typically runs in about 8 steps with CFG set to 1, making it more efficient while still maintaining good output quality.
5 ltx-2.3-spatial-upscaler-x1.5-1.0.safetensors This model is used to increase the spatial resolution of LTX-2.3 latents by 1.5×. It is commonly used in multi-stage or multiscale pipelines to generate videos at higher resolution.
6 ltx-2.3-spatial-upscaler-x2-1.0.safetensors This is another spatial upscaler that enlarges the resolution of LTX-2.3 latents by 2×. It is useful in multi-stage generation pipelines when you want sharper and higher-resolution video outputs.
7 ltx-2.3-temporal-upscaler-x2-1.0.safetensors This model improves the temporal aspect of generated videos by increasing the frame rate by 2×. It is typically used in multi-stage pipelines to create smoother videos with higher FPS.


Save diffusion models into ComfyUI/models/checkpoints folder, not the diffusion_models folder.
Save upscaler models into ComfyUI/models/latent_upscale_models folder.
Save lora models into ComfyUI/models/loras folder.
    

GGUF- 

If you want the LTX-2 GGUF models for low VRAMs, download them and setup accordingly. Make sure you have already installed the  ComfyUI-GGUF custom node by city-96 from the Manager. If already done, update this custom node from the Manager.


(a) LTXV-2.3 GGUF by Quantstack

(b) LTXV-2.3 GGUF by Unsloth

(c) LTX2.3 by Kijai

Save it into ComfyUI/models/unet folder. To get the detailed overview, you can follow our tutorial on quantized models.


4. Restart ComfyUI and refresh it to take effect.


Workflow


1. Download the workflows from our Hugging Face repository. Alternatively you can also get it from the Workflow template section of ComfyUI.

(a) LTX2.3_T2V.json (Text to Video workflow)

(b) LTX2.3_I2V.json (Image to Video workflow)

(c) LTX-2.3_ICLoRA_Motion_Track_Distilled.json

(d) LTX-2.3_ICLoRA_Union_Control_Distilled.json

(e) LTX-2.3_T2V_I2V_Single_Stage_Distilled_Full.json

(f) LTX-2.3_T2V_I2V_Two_Stage_Distilled.json

2. Drag and drop into ComfyUI.

3. Rest of the models (Text encoders etc) will be same as that of LTX2.0 model setup. If you are new to it just follow our older LTX2.0 tutorial.