Z Image Turbo (BF16/FP8/GGUF) in ComfyUI

Another photorealistic model is here. Z-Image is a powerful and highly efficient image generation model built by Z-Image Team, Tongyi MAI, Alibaba Group with 6 billion parameters. It currently comes in three versions, and one of the most impressive among them is Z Image Turbo.

This variant is a distilled of Z Image Base model, faster version of the main model and still manages to match and in some cases even beat the top competitors while using only 8 NFEs (Number of Function Evaluations). You can now train Z Image Turbo to create your own stylized/character lora model.

Z-Image-Turbo showcase (Ref-official page)

Z-Image-Turbo focuses heavily on photorealistic quality, producing sharp, clean visuals along with strong overall aesthetics. It also includes a Prompt Enhancer that adds reasoning capabilities, helping the model understand deeper context instead of just following surface-level descriptions. More detailed insights can be found into their research paper.

Z-Image-Turbo showcase (Ref-official page)

This allows it to generate richer, more accurate results. Another standout feature is its ability to handle bilingual text extremely well.

Z-Image-Turbo showcase (Ref-official page)

Whether it's complex English or Chinese characters, Z-Image-Turbo can render them cleanly and accurately something many models still struggle with.

Table of contents:

Installation

1. Install ComfyUI if you are new user. Update ComfyUI from the Manager by selecting Update All if you already using.

2. There are multiple Z Image turbo model variants released by the community(choose any of them as per system resources):

(a) BF16 Variant

Download Z Image turbo BF16 (z_image_turbo_bf16.safetensors) optimized by ComfyUI.

(b) FP8 Variant

Download Z Image turbo FP8 (z-image-turbo-fp8-e4m3fn.safetensors Or z-image-turbo-fp8-e5m2.safetensors) optimized by T5B. If you want better images then use E4M3FN. If you want maximum speed then use E5M2.

(c) GGUF Variant

Download Z Image turbo GGUF by Jayn7 (Q2 for fast inference to Q8 for better quality). Lower VRAM and system RAM users need to use the GGUF variants by analyzing its model size.

Save it into ComfyUI/models/diffusion_models folder.

For GGUF models, make sure you have ComfyUI-GGUF custom node by City 96. If not yet done, just install from Manager by selecting Custom Nodes Manager option. Update it if already using this.

If you do not know what is FP8/BF16/GGUF model variants, just follow our quantization tutorial to get more in depth overview.

(d) AIO (All in one model)

Download Z image Turbo All in one model (z-image-turbo-bf16-aio.safetensors or z-image-turbo-fp8-aio.safetensors). This includes Text encoder and VAE. You do not need to download them separately. Use FP8 for ~10GB VRAMs and BF16 for ~20GB VRAMs. Save it into ComfyUI/models/checkpoints folder.

3. Download Vae (ae.safetensors) and save this into ComfyUI/models/vae folder. This is the same vae that we use in Flux1. If you already have then downloading again is not required.

4. Download text encoder (qwen_3_4b.safetensors). Save this into ComfyUI/models/text_encoders folder.

5. Restart and refresh ComfyUI to take effect.

Workflow

1. Download the workflows from our Hugging Face repository.

(a) Z_Image_Turbo_Workflow.png (Basic BF16/FP8 workflow)

(b) Z_Image_Turbo (GGUF).json (GGUF variant workflow)

(a) Load Z Image turbo fp8 on load diffusion model node.

(b) Load text encoders, vae into their respective node.

(d) Set KSampler Settings-

CFG- 0 (for turbo mode), normal use -1.0

Steps-9

Resolution -1024 by 1024

Sampler & Scheduler (use these combination):

- dpmpp_sde & ddim_uniform

- dpmpp_sde & beta

- euler_ancestral & ddim_uniform

- euler_ancestral & beta

(e) Hit run to start the execution.

Here, the results are not cherry picked. To test the real model performance, we are showing you what we got at our first attempt.

Image Generation Testing

the girl is partying hard (z image turbo testing)

Prompt used- This is a realistic, analog-style photo of Karina. It captures a scene where she is attending a secret Illuminati party. She is winking, raising one hand above her head making a V-sign, and holding a highball glass in the other hand. The party features dazzling lasers and lights, and an LED screen displaying the Illuminati symbol. Aliens, Elon Musk, Donald Trump, and famous celebrities are dancing. The photo looks raw and unedited, characterized by visible film grain and the intense lighting from a flash. Cool lighting. Analog style. A candid, honest photo.

A woman with fair skin and blonde hair(z image turbo testing)

Prompt used: A woman with fair skin and blonde hair styled in a high, neat bun is sitting in a brightly colored retro kitchen. She wears a glossy peach-colored vinyl trench dress with structured shoulders and matching peach buttons. Under the dress, she has long, metallic turquoise gloves that extend past her elbows. Her makeup is bold, featuring vivid turquoise eyeshadow, thick eyelashes, sharply defined brows, and bright pink lipstick.She is seated at a glossy pink countertop. In front of her is a halved grapefruit on a white plate. Nearby is a vintage rotary telephone in bright orange, a cylindrical bottle lying on its side with spilled pink liquid forming a puddle, and a metallic toaster with orange accents.The woman holds a cereal box in her right hand. The cereal box is white and pink with colorful fruit graphics and has the word “HAPPINESS” printed on the front in bold letters.The kitchen environment has vibrant, saturated colors. The lower cabinets are turquoise with silver handles. The backsplash consists of alternating pink and white squares. The upper cabinets are turquoise, with one open shelf revealing an orange interior. On the shelf are three white canisters with metal lids labeled “JOY,” “JOY,” and “BLISS” in turquoise and pink lettering. There is also an orange mug displayed on the shelf.At the back of the kitchen is a stainless steel sink and faucet. Additional objects, such as small containers and a pink bottle, sit on the countertop beside the toaster. Soft pink and violet under-cabinet lighting illuminates the backsplash and contributes to the vibrant color palette. The overall composition is clean, bold, and highly stylized, with strong color contrast and precise product placement.

Prompt used-A stylish young woman standing confidently on a rainy New York street, wearing a fitted white tank top and a short red skirt. Her pose is elegant and natural, with one leg slightly forward and a relaxed yet confident expression. Reflections of city lights shimmer on the wet pavement around her. Yellow taxis, neon signs, and blurred pedestrians in the background create an authentic urban atmosphere. Raindrops gently fall, and her hair appears slightly damp from the rain. The overall scene is cinematic and photorealistic, with soft lighting, shallow depth of field, and a moody, vibrant color tone.

Prompt used- very pretty caucasian girl at age 18(with subtle alternative-style makeup and short, curly brown hair with soft layers and see-through side bangs), her hair is styled as a wolf cut, she is sitting in the selfie, which was taken at night in paris, with an average looking tenement visible in the background, its in the rural part of the city with a park in the back. The angle is messy, with slight motion blur and overexposure. The overall vibe is that of a casually taken, mediocre or even failed selfie as if snapped without much thought or effort

Prompt used- a girl making her eyes like this ♥.♥ , anime style

This also understand emojis but do not add the word 'emoji' into your prompts, just the symbol instead.

The model performs very well with even low VRAMs. The prompt adherence the model follows and detailing are really good. To get best results, its officially recommended to put long detailed prompts.

Import points to remember-

1. The model exhibits limited output variation when only the seed is changed in standard txt2img workflows, often producing nearly identical faces due to its strong prompt adherence.

2. The primary cause of low diversity is that txt2img generation implicitly uses a denoising strength of 1.0, leaving little room for stochastic variation.

3. Reducing the denoising strength below 1.0 (e.g., ~0.7) significantly increases visual diversity and facial variation, even with very short prompts.

4. Applying a two-stage workflow like initial low-resolution generation followed by an img2img pass secondly effectively balances diversity and image quality. The first stage introduces variation by using a lower denoising strength, while the second stage refines structure and reduces noise.

5. Simple prompts (e.g., 'Face', 'Person', 'Avenger Movie Scene') are sufficient to achieve substantial diversity when denoising strength is adjusted appropriately.

6. Resolution staging (low resolution for variation, higher resolution for refinement) improves speed and scalability without sacrificing output quality.

7. The approach can be adapted easily by modifying resolutions and denoising strengths based on hardware capacity and desired output characteristics.

8. Compared to standard denoising strength (1.0), the proposed method produces markedly higher variation, as demonstrated through side-by-side comparisons.

You can obviously see the results.Of course it cannot generate good results at it first attempt but its generating way more better results than any other models and do not forget that its just a 6 billion parameter model. They also released ControlNet Union that covers canny, MSLD, HED, Pose and Depth. You can definitely give it a try.

Z Image Turbo (BF16/FP8/GGUF) in ComfyUI

Installation

Workflow

Image Generation Testing

Posted by Administrator

Search This Blog

Popular Posts

Krea2 Raw/Base & Turbo (BF16/FP8/NVFP4/INT8) High Quality Image Gen

22 Top Krea2 LoRA models for Stylized Image Generation

Sulphur 2 -The Uncensored LTX2.3 Video Generation

Flux.2 Klein 4b-9B(GGUF/FP8/BF16) Image Gen & Editing

TIPO and DanTagGen : Power your Image Prompting

Install Forge Neo WebUI- Better than Forge & Automatic1111

Important Pages

Our Social Page

Recent Post

Contact form

Z Image Turbo (BF16/FP8/GGUF) in ComfyUI

Installation

Workflow

Image Generation Testing

Posted by Administrator

Related Posts

Search This Blog

Our Social Community

Popular Posts

Krea2 Raw/Base & Turbo (BF16/FP8/NVFP4/INT8) High Quality Image Gen

22 Top Krea2 LoRA models for Stylized Image Generation

Sulphur 2 -The Uncensored LTX2.3 Video Generation

Flux.2 Klein 4b-9B(GGUF/FP8/BF16) Image Gen & Editing

TIPO and DanTagGen : Power your Image Prompting

Install Forge Neo WebUI- Better than Forge & Automatic1111

Important Pages

Our Social Page

Recent Post

Contact form