LTX 2.3 Video+Audio Local Gen(GGUF/FP8/BF16)

 

ltx2.3 gguf/fp8/bf16 in comfyui

LTX2.3 video with audio generation model released by Lightricks. Its a major upgrade to the LTX-2 model that improves both audio-visual quality and prompt adherence.  

LTX-2.3 is a DiT-based audio-video foundation model capable of generating synchronized video and audio within a single system. It combines the key components of modern video generation, offers open weights, and is designed to run efficiently on local machines for practical use. To get the in-depth insights you can access their research paper. 

It comes with the improved result on sharper fine detailing, better prompt adherence, more cleaner audio quality, stable image to video generation, supports vertical portrait styled video for social media posting. 

They also released the desktop variant. LTX Desktop is an open-source desktop application registered under Apache2.0 designed to generate videos using LTX models directly on your computer. It supports local video generation on Windows systems with compatible NVIDIA GPUs, while also providing an API-based mode for unsupported hardware and macOS users. This allows creators to experiment with LTX models even if their system cannot run the models locally.  

The application brings multiple video generation capabilities into a single interface. Users can create videos from text prompts, images, or audio inputs, making it flexible for different creative workflows. It also includes a Retake-based video editing feature, which allows users to regenerate or modify parts of a video without starting from scratch. 

Along with this, LTX Desktop provides a built-in video editor and project-based workflow, helping users organize and manage their video editing projects more efficiently.  For local video generation on Windows, the system requires Windows 10 or 11 (x64) along with an NVIDIA GPU that supports CUDA and has at least 32GB of VRAM, although more VRAM is recommended for better performance. 

The system should also have at least 16GB of RAM, with 32GB recommended, and sufficient disk space to store the model weights and generated video files.  On macOS, LTX Desktop works through API mode only. It requires an Apple Silicon (arm64) device running macOS 13 Ventura or later, along with a stable internet connection to communicate with the remote API for video generation. 

This is the desktop variant but we will stick to ComfyUI and see how we can run on it.

Installation

Update Comfyui


1.First, install ComfyUI if you are a new user. Older user need to update it from the Manager by selecting Update All.

2. From the Manager, select Install Custom Nodes option. Search for "LTXVideo" custom node and install it. If already installed then just update it from the Manager by clicking on Custom nodes manager option.

3. Now, there are different model variants-text to video, Image to video and Control to video. Download from LTX -2..3 Hugging face repository. To get the over all overview, you can also follow the LTX2 tutorial if you are new to it. 

download ltx2.3 models

Download the models(BF16/FP8/GGUF) as described below:-

Sl No Model Name Model Type Description
1 ltx-2.3-22b-dev.safetensors BF16(diffusion model)variant This is the full LTX-2.3 model with complete capabilities. It is designed for flexible usage and can also be trained or fine-tuned using bf16 precision, making it suitable for development and advanced experimentation.
2 ltx-2.3-22b-dev-fp8.safetensors FP8(diffusion model) variant This is LTX-2.3 FP8 model variant for low VRAM users. But, quality will be compromised.
3 ltx-2.3 by Kijai FP8/ BF16(diffusion model) variant This is LTX-2.3 optimised by Kijai for low VRAM users
4 ltx-2.3-22b-distilled-lora-384.safetensors Distilled Lora This file contains a LoRA version of the distilled model. It is meant to be applied on top of the full LTX-2.3 model to enhance or modify its behavior without loading an entirely separate model.
5 ltx-2.3-22b-distilled.safetensors Distilled(diffusion model) variant This is a distilled version of the full model that is optimized for faster generation. It typically runs in about 8 steps with CFG set to 1, making it more efficient while still maintaining good output quality.
6 ltx-2.3-spatial-upscaler-x1.5-1.0.safetensors Upscaler This model is used to increase the spatial resolution of LTX-2.3 latents by 1.5×. It is commonly used in multi-stage or multiscale pipelines to generate videos at higher resolution.
7 ltx-2.3-spatial-upscaler-x2-1.0.safetensors Upscaler This is another spatial upscaler that enlarges the resolution of LTX-2.3 latents by 2×. It is useful in multi-stage generation pipelines when you want sharper and higher-resolution video outputs.
8 ltx-2.3-temporal-upscaler-x2-1.0.safetensors Upscaler This model improves the temporal aspect of generated videos by increasing the frame rate by 2×. It is typically used in multi-stage pipelines to create smoother videos with higher FPS.


Save diffusion models into ComfyUI/models/checkpoints folder, not the diffusion_models folder.
Save upscaler models into ComfyUI/models/latent_upscale_models folder.
Save lora models into ComfyUI/models/loras folder.

Rest of the models (Text encoders etc) will be same as that of LTX2.0 model setup. If you are new to it just follow our older LTX2.0 tutorial.


GGUF- 

If you want the LTX-2 GGUF models for low VRAMs, download them and setup accordingly. Make sure you have already installed the  ComfyUI-GGUF custom node by city-96 from the Manager. If already done, update this custom node from the Manager.

(a) LTXV-2.3 GGUF by Quantstack

(b) LTXV-2.3 GGUF by Unsloth

Save it into ComfyUI/models/unet folder. To get the detailed overview, you can follow our tutorial on quantized models.


4. Restart ComfyUI and refresh it to take effect.


Workflow


1. Download the workflows from our Hugging Face repository. Alternatively you can also get it from the Workflow template section of ComfyUI. 

If using GGUF model variant, while using the workflows make sure to replace the Load diffusion model node with Unet Loader node.

(a) LTX2.3_T2V.json (Text to Video workflow)

(b) LTX2.3_I2V.json (Image to Video workflow)

(c) LTX-2.3_ICLoRA_Motion_Track_Distilled.json

(d) LTX-2.3_ICLoRA_Union_Control_Distilled.json

(e) LTX-2.3_T2V_I2V_Single_Stage_Distilled_Full.json

(f) LTX-2.3_T2V_I2V_Two_Stage_Distilled.json

2. Drag and drop into ComfyUI.

Image To Video Workflow-

(a) Add prompt into prompt box

(b) Set resolution in width and height, video length.

(c) Load checkpoint, text encoder, lora and upscaler models.

(d) Hit run to start generation. 

Image To Video node

Text To Video Workflow-

 (a) Add your image into Load Image node.

(b) Add prompt into prompt box

(c) Set resolution in width and height, video length.

(d) Load checkpoint, text encoder, lora and upscaler models.

(e) Hit run to start generation. 

 

 

load image node

Text To Video node


 Some of the videos generated using LTX2.3 in ComfyUI.