Before starting, make sure you have a basic understanding of the LTX 2.3 video generation model and its workflow. You can train locally using a high-end GPU (like RTX 5090 or higher) or use cloud services such as Runpod, which typically cost around $2/hour.
We use an RTX 6000 Pro on Runpod. All the necessary settings for training the video lora will be same for local and cloud systems.
Table of Contents
Requirements
1. Use NVIDIA RTX 5090, if you want local training. If you do not have GPUs or you are a Mac user, try Runpod Cloud that will cost you hardly around $2/hr on RTX 5090 card.
2. Operating System- Windows/Linux
3. Python greater than 3.10, Git, Pytorch
4. NodeJS installed
Installing AI Toolkit
Before we start training, we need to install AI Toolkit UI for Windows/Linux system. Select any drive location.
People already installed AI toolkit need to update it. To do this, move inside the root folder(ai-toolkit), open command prompt and use the command git pull to update it.
1. New user need to install Ai toolkit UI. Open terminal and use following commands.
(a) For Windows:
Use the AI-Toolkit Automatic installation bat setup file from github repository. This handles auto-updates and download all the required files (python, cuda, git, NodeJs etc) automatically.
Just download AI-Toolkit-Easy-Install.bat file and click to start installation. After installation it will open AI toolkit inside your browser on address (http://localhost:8675).
(b) For linux system:
-Clone AI toolkit repo:git clone https://github.com/ostris/ai-toolkit.git
-Move into folder:cd ai-toolkit
-Create virtual environment:python3 -m venv venv
-Activate virtual env:source venv/bin/activate
-Install torch:pip3 install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126 pip3 install -r requirements.txt
-Install requirements dependencies:pip3 install -r requirements.txt
2. After installation, open move inside folder and execute the command:cd uinpm run build_and_start
3. Manually Open Ai-toolkit UI using following address inside your browser:http://localhost:8675
(c) On Runpod
Select your favorite GPU and do the following settings:
Training process
We are going to train lora on a movie character- Harley Quinn. We just chosen randomly from google and social media platforms. Make sure you take confirmation while using as dataset.
Download random clips from any platform having only the specific character . If you want to use your face, just record the portrait/ landscape videos.
1. Preparing Dataset
-Total 15-20 video clips are best in High Definition (HD) quality. We are uploading 15 video clips.
-clips from different angles
-each into 2-8 seconds video clip. Tip-If you have lower end GPUs, go for shorter video length.
(b) After chopping clips, head over to AI Toolkit and upload each clips by clicking on "Dataset" option (left panel), then select "New Dataset" available on the top right corner. Name the dataset to anything relevant.
(c) Do the captioning for each clip. Name every video clip in mp4 serially with same name with txt extention.
Ex-
video1.mp4 >> video1.txt
video2.mp4 >> video2.txt
video3.mp4 >> video3.txt
video4.mp4 >> video4.txt
and so on...
Use the descriptive and detailed captions for each video clips into their respective txt file. It would be easy for the model to understand whats actually happening in the videos. If something the subject is saying make sure to add that as well.
For Ex- Harley is doing professional photoshoot holding a perfume bottle saying "This is what you need to enhance your life style"
The more detailed your dataset will be, the better your output gets generated from you lora model.
If you think manually is very tedious, use the Microsoft's vision model
in Comfyui. You can also use LLMs like ChatGPT/claude/Gemini with pro
feature to get the best out of it.
(d) Now, Create "New Job". Fill the details as provided below-
2. Setting Parameters
JOB
Training name-choose anything relevant, like- "harley_ltx2.3"
MODELS
Model Architecture- Ltx2.3
Options- Enable Low VRAM (Default) as its a video model, and Enable Layer Offloading (if using RTX 5090)
QUANTIZATION
Tranformer- float8
Text encoder- float8
TARGET
Target type-Lora
Linear Rank-32
SAVE
Data type- BF16; Use- Fp8 (for low VRAMs); means higher levels decreases the hallucinations with quality enhancements.
Save every-250
Max Steps save to keep-100
TRAINING
Learning rate- 0.0001
Steps-Use formula (min 200 to 250 max) x no. of video clips, like- (200 to 250) x 15 video clips = 3000 to 3750 steps; means after 3000+ steps you will notice good results and that goes to maximum ~3750 steps. So set around 3750 steps.
Cache text Embedding- If not using triiger word, enable this. This will load/unload text encoder from memory.
Timestep Bias- High Noise (for faster training)
Leave rest as default.
ADVANCED
Do differential Guidance-Enable
Differential Guidance scale-3
DATASETS
Target dataset-Choose your relevant data set explained above, such as- HarleyQuinn
Cache Latents- Enable
Do Audio- Enable
Auto Frame Count-Enable
Resolutions- Enable 512 (for RTX5090) only; Enable 512,768,1024 (for RTX 6000 pro or high VRAMs)
SAMPLE
Sample Every-250
Sampler-Flow Match
Guidance scale-4
Sample steps-25
Num Frames-121 (video length in seconds X FPS+1); means ex- 5x24+1=121; you can set anything but stick with the formula.
FPS-24
Seed-42
Sample Prompts(10)- Add prompts captioning with different perspective. As its a process incentive task 3-5 samples is a good number. You can add any type of prompts.
For ex-
Harley in swim suit getup saying "hello lovely fans I am back".
Do this for rest of the prompts.
During training process, the AI toolkit will generate videos using your currently trained lora model with these prompts. This will help you to identify how much better your lora model has been trained and where to stop your training process by comparing the generated results.
Leave rest as default. Finally select Create Job option available on top right.
All the realtime generated results can be found inside "Samples" tab from the top menu.
3. Start Your training process
After setting up the parameters, a new job will be created on the list. Hit play button(at the top right) to do the execution. This will intially download the models(LTX2.3 models, text encoders, etc) once from its official hugging face repository.
You can track the realtime status on the dashboard. Training can be controlled (play/pause/stop) whenever required. After completion, you can find the lora files(.safetensors) inside AI-Toolkit/output folder.
Test Lora in ComfyUI.
1. After training put your trained lora (.safetensors) file inside ComfyUI/models/loras folder. Download LTX2.3 Lora workflow from our hugging face repository.
2. Drag and drop into comfyui. Load all the models(ltx2.3, text-encoders, lora, vae etc) into their respective nodes.
3. Put your text prompt. Use the trigger word if you used while training otherwise leave it. For Ex- Harley looking intense towards camera, waving her right hand saying-"hello boys wanna do some chit-chat".
4. Set the values for KSampler Settings
5. Hit Run to start generation





