Z Image Turbo model is the distilled version and light enough that can be trained inside average consumer grade GPUs. After training multiple loras we are here to provide you the sweet spot.
We just need to make sure some of the parameters is correctly selected and rest will be as usual as we do with normal lora training. Here, we will be using AI-Toolkit that's well updated WeBUI for training a bunch of diffusion based models.
Table of Contents:
Requirements
1. NVIDIA RTX based minimum 16GB VRAM (without memory offloading), 10-12 GB VRAM (with offloading technique)
2. Operating System- Windows/Linux
3. Python greater than 3.10, Git, Pytorch
4. NodeJS installed
Installing AI Toolkit
Before we start training, we need to set up our environment properly for Windows/Linux system. Select any drive. If you already installed on your Windows machine or you are a Linux user, then you can skip Step.
People already installed AI toolkit need to update it. To do this, move inside the root folder(ai-toolkit), open command prompt and use the command git pull to update it.
1. New user need to install Ai toolkit UI. Open terminal and use following commands.
(a) For Windows:
Open terminal, use the AI-Toolkit Automatic installation bat setup file from github repository. This handles auto-updates and download all the required files (python, cuda, git, NodeJs etc) automatically.
Just download AI-Toolkit-Easy-Install.bat file and click to start installation. After installation it will open AI toolkit inside your browser on address (http://localhost:8675).
(b) For linux system:
-Clone AI toolkit repo:
git clone https://github.com/ostris/ai-toolkit.git
-Move into folder:
cd ai-toolkit
-Create virtual environment:
python3 -m venv venv
-Activate virtual env:
source venv/bin/activate
-Install torch:
pip3 install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126 pip3 install -r requirements.txt
-Install requirements dependencies:
pip3 install -r requirements.txt
2. After installation, open move inside folder and execute the command:
cd ui
npm run build_and_start
3. Manually Open Ai-toolkit UI using following address inside your browser:
http://localhost:8675
Training process
You can train it directly as this the distilled variant and not the base one. As diffusion models need CFG with almost 20-50 steps to get good result. Usually if you do the training on these models it will lose its original feature and model do not learn new data in its learning process. In simple words, the models tries to generate artifacts in the end result.
So, basically Ostris built the de-distillation training adapter that will going to help into the training process without breaking the model distillation. You do not need to do something new as its embeded inside AI-toolkit and you just have to select the related parameters. With 5000-20000 steps you can do style lora, character lora etc, its pretty doable. We are using NVIDIA RTX 4090 with 24GB VRAM.
1. Preparing Dataset
You can skip this step if you already know the Ai Toolkit data preparation. Here, we want to train images for character lora. So, create three new folders- before, after and test
(a) Before - Inside this, save images in same formats with txt file captions. Basically 15-25 is great for lora. Do captioning for each images and store them in each respective txt file and name it with same filename (either png or jpg). For example-
image1.png >> image1.txt
image2.png >> image2.txt
image3.png >> image3.txt
and so on.
You can do auto-captioning using LLMs(GPT, gemini) to save time for bulk images. Be simple and relative while prompting. Do not go for over detailing longer prompt as these can disturb your end result.
Make sure you add trigger words inside captioning with [ ] brackets. While captioning, we are using Kariiina as trigger word with [ ] . For ex- [Kariiina] dancing in the Halloween party gathering.
(b) After - It should be empty. This is the place where the results of your lora model tries to generate in every 500 steps.
(c) Test - 5-6 input images for testing your lora output in realtime training. This will help you to identify where you want to stop your training.
Now, head over to AI Toolkit dashboard. Select "Datasets" option, then click New Dataset and create new 3 dataset folders with same name as above. Drag and drop all those files from the folders into these respective dataset folders.
2. Setting Parameters
JOB
Training Name- Z_Image_Turbo_kariiina (Choose any thing relevant)
Trigger word- kariiina (Choose your trigger word)
MODEL
Here, add the Hugging Face model repo path ID. This will download the models with training adapters.
Model Architecture- Select Z Image Turbo W/ training adapter (for normal training); Choose Z Image De-Turbo De-distilled (for faster process).
Name or path- Tongyi-MAI/Z-Image-Turbo (access path from official repo)
Training adapter path- ostris/zimage_turbo_training_adapter or ostris/zimage_turbo_training_adapterV2 (choose adapter V2 for refined results; access path from repo)
Options-Low VRAM enable if using low VRAM (10-12 GB)
Layer Offloading-Disable, Enable if using low VRAM
QUANTIZATION
Transformer- Float 8(default), None (for 24Gb or more VRAMs)
Text Encoder-Float 8(default)
TARGET
Target Type- LoRA
Linear rank-32
SAVE
Data type- BF16; Use- Fp8 (for low VRAMs); means higher levels decreases the hallucinations with quality enhancements.
Save every-250
Max Steps save to keep-4
TRAINING
Learning rate- 0.0001
Steps-select 3000 (normal); 5000 ( will work great with good captioning and dataset)
Cache text Embedding- Enable if having low VRAM. This load/unloads text encoder from memory.
Timestamp Bias- Balanced (for character lora), High Noise (for style lora)
Leave rest as default.
ADVANCED
Differential guidance- Enable/disable; its experimental. Disable if you want use traditional way.
This is the new way of training data while coming to more closer and faster to the actual result.
Diff guidance scale-3 (default if enabled)
DATASETS
Target dataset-Choose your relevant data set explained above.
Resolutions- Enable 512 (for low VRAM) only; Enable 512,768,1024 (for high VRAMs)
Leave rest as default.
SAMPLE
Sample Prompts- Add prompts captioning with different perspective. You can add any type of prompts. But for fasting the process, we changed the word "woman" with our trigger word in every prompts.
For ex- our trigger word is "kariiina" so the prompt will be- "kariiina with red hair playing chess at the park...". Do this for rest of the prompts.
During training process, the AI toolkit can generate images using your trained lora model with these prompts. This will help you to identify how much better your lora model has been trained and where to stop your training process.
Leave rest as default. Finally select Create Job option available on top right.
3. Start Your training process
After setting up the parameters, a new job will be created on the list. Hit play button(at the top right) to do the execution. This will download the models(Z image turbo, adapters etc) from its official hugging face repository. You can see the realtime status on the dashboard.
Time take to train loras-
RTX 3090- 1 Hour (approx)
RTX 4090 - 45 minutes (approx)
RTX 5090- 25 minutes (approx)
After training, you can push trained lora to your Hugging face repository and share with the community.
Use and test Lora in ComfyUI.
1. After training put the lora inside ComfyUI/models/loras folder. Download Z Image Turbo Lora workflow from our hugging face repository.
2. Drag and drop into comfyui.
3. Put your text prompt with trigger word.
4. Set the values for KSampler-
Sampler-Euler
Steps-20
CFG-2 or 3
5. Hit Run to start generation





