LTX 2.3 Prompt Relay- Controlled Multi Event Video Generation

 

ltx 2.3 prompt relay

Video diffusion models have gotten really good at generating high-quality visuals. But when it comes to telling a proper story over time, they start to fall apart. Here is the issue, If you give a model a single, long prompt describing multiple events say, "a man enters a room, sits down, then a dog runs in", the model often mixes everything together. 

Instead of clean transitions, you get events overlapping awkwardly, concepts appearing at the wrong time, poor alignment between text and visuals. This happens because the model doesnot really understand when something should happen, how long it should last in what order events should unfold. That is exactly what LTX 2.3 with Prompt Relay aims to solve without making the model heavier or more complex.

prompt relay explanation
Prompt Relay explanation

Instead of dumping everything into one messy paragraph, imagine being able to-

-Control exact timing of events  

-Ensure clean transitions  

-Keep each scene focused and consistent 

The problem is not just generation but its attention. In standard diffusion models, all parts of the video can attend to all parts of the prompt. Thats why concepts bleed into each other (this is called semantic entanglement). Prompt Relay fixes this by dividing the video into temporal segments, assigning each segment its own prompt, restricting attention so each segment focuses only on its relevant prompt. Detailed research insights can be found in research paper.

It also uses extra approach called Boundary Attention Decay.  A soft Gaussian penalty that reduces attention across segment boundaries, smoothly prevents concepts from leaking into adjacent segments  that give more cleaner, and controlled storytelling. 


Installation


1. First, you have to do the ComfyUI installation. Update ComfyUI to its latest version from the Manager by selecting Update ComfyUI if using older version.

Update comfyui from manager


Install the Prompt Relay custom node from Kijai's repository or alternatively use the Manager to install it.

Clone the repository inside ComfyUI/custom_node folder, using following command:
git clone https://github.com/kijai/ComfyUI-PromptRelay.git

2. Make sure you do the basic LTX2.3 setup while running the workflow. If not done yet, follow our LTX 2.3 tutorial



Same models are also listed here:
(a) Download Ltx 2.3 22b dev (ltx-2.3-22b-dev.safetensors) model. Save this inside ComfyUI/models/checkpoints folder.

You can also use the Ltx2.3 GGUF model if having low VRAMs, and save it into ComfyUI/models/unet folder.

(b) Next, download Gemma 3 12b text encoder. Save this into ComfyUI/models/text_encoders folder.

(c) Download VAE  and save this into ComfyUI/models/vae folder.

(d) For Audio handling and generation, download Kijai's LTX2.3 Audio vae. Save this into ComfyUI/models/vae folder.
(e) Download LTX 2.3 Spatial Upscaler  for upscaling your output. Save this into ComfyUI/models/latent_upscale_models directory.

(f) Download LTX2.3 distilled lora  for  fast video generation and save this into ComfyUI/models/loras folder. 

(g) Its optional for 2x and 4x stage enhancing detailing. Download IC lora detailer and save this into ComfyUI/models/loras folder.

(h) Its optional for editing. Download Edit anything lora . Save this into ComfyUI/models/loras folder. Disable edit and inpainting section if not required.

3. Next, download LTX 2.3 VBVR LoRA . Put this into ComfyUI/models/loras folder.


4. Restart and refresh the ComfyUI.




Workflow

1. Download the workflow (LTX-2.3_PromptRelay.json) from our Hugging face repository.

2. Drag and drop into ComfyUI. Install the missing (red outlined)custom nodes from the Manager by selecting Install missing custom nodes option.

3. Load all the necessary models into their respective nodes.

4. Add the prompts into prompt box. Follow the instruction you need to do the prompting. For more details follow the Kijais Prompt relay repository.

5. Input image into Load image if using I2V pipeline.

6. Hit Run to start generation.
 

 


 

 


This is one of those ideas that feels obvious after you see it.  Instead of making models bigger or more complex, it focuses on how prompts are used over time which is actually where most failures happen in video generation.

The biggest win here is practicality means no retraining, no extra compute, immediate improvement. If video diffusion is moving toward filmmaking and storytelling, methods like Prompt Relay are not just helpful but essential.Because at the end of the day, generating pretty frames is easy.