LTX 2.3 Video+Audio(GGUF/FP8/NVF4/BF16) Local Gen

LTX2.3 video with audio generation model released by Lightricks. Its a major upgrade to the LTX-2 model that improves both audio-visual quality and prompt adherence.

LTX-2.3 is a DiT-based audio-video foundation model capable of generating synchronized video and audio within a single system. It combines the key components of modern video generation, offers open weights, and is designed to run efficiently on local machines for practical use. To get the in-depth insights you can access their research paper.

It comes with the improved result on sharper fine detailing, better prompt adherence, more cleaner audio quality, stable image to video generation, supports vertical portrait styled video for social media posting.

They also released the desktop variant. LTX Desktop is an open-source desktop application registered under Apache2.0 designed to generate videos using LTX models directly on your computer. It supports local video generation on Windows systems with compatible NVIDIA GPUs, while also providing an API-based mode for unsupported hardware and macOS users. This allows creators to experiment with LTX models even if their system cannot run the models locally.

Introducing LTX-2.3

Our most production-ready model yet.

The fastest 4K video generation in the world with built-in native dialogue.

Here’s what’s new 🧵

1/9 pic.twitter.com/PodRxgN4ju
— LTX Studio (@LTXStudio) March 5, 2026

The application brings multiple video generation capabilities into a single interface. Users can create videos from text prompts, images, or audio inputs, making it flexible for different creative workflows. It also includes a Retake-based video editing feature, which allows users to regenerate or modify parts of a video without starting from scratch.

Along with this, LTX Desktop provides a built-in video editor and project-based workflow, helping users organize and manage their video editing projects more efficiently. For local video generation on Windows, the system requires Windows 10 or 11 (x64) along with an NVIDIA GPU that supports CUDA and has at least 32GB of VRAM, although more VRAM is recommended for better performance.

The system should also have at least 16GB of RAM, with 32GB recommended, and sufficient disk space to store the model weights and generated video files. On macOS, LTX Desktop works through API mode only. It requires an Apple Silicon (arm64) device running macOS 13 Ventura or later, along with a stable internet connection to communicate with the remote API for video generation.

This is the desktop variant but we will stick to ComfyUI and see how we can run on it.

Installation

1.First, install ComfyUI if you are a new user. Older user need to update it from the Manager by selecting Update All.

2. From the Manager, select Install Custom Nodes option. Search for "LTXVideo" custom node and install it. If already installed then just update it from the Manager by clicking on Custom nodes manager option.

3. Now, there are different model variants-text to video, Image to video and Control to video. Download from LTX -2..3 Hugging face repository. To get the over all overview, you can also follow the LTX2 tutorial if you are new to it.

Download the models(BF16/FP8/GGUF) as described below:-

Sl No	Model Name	Model Type	Description
1	ltx-2.3-22b-dev.safetensors	BF16(diffusion model)variant	This is the full LTX-2.3 model with complete capabilities. It is designed for flexible usage and can also be trained or fine-tuned using bf16 precision, making it suitable for development and advanced experimentation.
2	ltx-2.3-22b-dev-fp8.safetensors	FP8(diffusion model) variant	This is LTX-2.3 FP8 model variant for low VRAM users. But, quality will be compromised.
3	ltx-2.3 by Kijai	FP8/ BF16(diffusion model) variant	This is LTX-2.3 optimized by Kijai for low VRAM users
4	ltx-2.3 NVF4	NVF4 (diffusion model) variant	This is LTX-2.3 optimized variant for low VRAMs like RTX 4000/5000/Pro 6000 series
5	ltx-2.3-22b-distilled-lora-384.safetensors	Distilled Lora	This file contains a LoRA version of the distilled model. It is meant to be applied on top of the full LTX-2.3 model to enhance or modify its behavior without loading an entirely separate model.
6	ltx-2.3-22b-distilled.safetensors	Distilled(diffusion model) variant	This is a distilled version of the full model that is optimized for faster generation. It typically runs in about 8 steps with CFG set to 1, making it more efficient while still maintaining good output quality.
7	ltx-2.3-spatial-upscaler-x1.5-1.0.safetensors	Upscaler	This model is used to increase the spatial resolution of LTX-2.3 latents by 1.5×. It is commonly used in multi-stage or multiscale pipelines to generate videos at higher resolution.
8	ltx-2.3-spatial-upscaler-x2-1.0.safetensors	Upscaler	This is another spatial upscaler that enlarges the resolution of LTX-2.3 latents by 2×. It is useful in multi-stage generation pipelines when you want sharper and higher-resolution video outputs.
9	ltx-2.3-temporal-upscaler-x2-1.0.safetensors	Upscaler	This model improves the temporal aspect of generated videos by increasing the frame rate by 2×. It is typically used in multi-stage pipelines to create smoother videos with higher FPS.

Save diffusion models into ComfyUI/models/checkpoints folder, not the diffusion_models folder.
Save upscaler models into ComfyUI/models/latent_upscale_models folder.
Save lora models into ComfyUI/models/loras folder.

Rest of the models (Text encoders etc) will be same as that of LTX2.0 model setup. If you are new to it just follow our older LTX2.0 tutorial.

GGUF-

If you want the LTX-2 GGUF models for low VRAMs, download them and setup accordingly. Make sure you have already installed the ComfyUI-GGUF custom node by city-96 from the Manager. If already done, update this custom node from the Manager.

(a) LTXV-2.3 GGUF by Quantstack

(b) LTXV-2.3 GGUF by Unsloth

Save it into ComfyUI/models/unet folder. To get the detailed overview, you can follow our tutorial on quantized models.

4. Restart ComfyUI and refresh it to take effect.

Workflow

1. Download the workflows from our Hugging Face repository. Alternatively you can also get it from the Workflow template section of ComfyUI.

If using GGUF model variant, while using the workflows make sure to replace the Load diffusion model node with Unet Loader node.

(a) LTX2.3_T2V.json (Text to Video workflow)

(b) LTX2.3_I2V.json (Image to Video workflow)

(d) LTX-2.3_ICLoRA_Union_Control_Distilled.json

(e) LTX-2.3_T2V_I2V_Single_Stage_Distilled_Full.json

(f) LTX-2.3_T2V_I2V_Two_Stage_Distilled.json

2. Drag and drop into ComfyUI.

Image To Video Workflow-

(a) Add prompt into prompt box

(b) Set resolution in width and height, video length.

(d) Hit run to start generation.

Text To Video Workflow-

(a) Add your image into Load Image node.

(b) Add prompt into prompt box

(d) Load checkpoint, text encoder, lora and upscaler models.

(e) Hit run to start generation.

Some of the videos generated using LTX2.3 in ComfyUI.

LTX 2.3 Video+Audio(GGUF/FP8/NVF4/BF16) Local Gen

Installation

Workflow

Posted by Administrator

Search This Blog

Popular Posts

Krea2 Raw/Base & Turbo (BF16/FP8/NVFP4/INT8) High Quality Image Gen

Top 22 Krea2 LoRA models for Stylized Image Generation

Train your WAN2.1 Lora model on Windows/Linux

Wan2.1 One to All Animation- Transfer Full Body Pose

SAM2: Track Objects in Image and Video (ComfyUI)

Qwen Image Edit 2511 GGUF/FP8/BF16 Improved Editing

Important Pages

Our Social Page

Recent Post

Contact form

LTX 2.3 Video+Audio(GGUF/FP8/NVF4/BF16) Local Gen

Installation

Workflow

Posted by Administrator

Related Posts

Search This Blog

Our Social Community

Popular Posts

Krea2 Raw/Base & Turbo (BF16/FP8/NVFP4/INT8) High Quality Image Gen

Top 22 Krea2 LoRA models for Stylized Image Generation

Train your WAN2.1 Lora model on Windows/Linux

Wan2.1 One to All Animation- Transfer Full Body Pose

SAM2: Track Objects in Image and Video (ComfyUI)

Qwen Image Edit 2511 GGUF/FP8/BF16 Improved Editing

Important Pages

Our Social Page

Recent Post

Contact form