We have all seen AI image generation models do amazing things but let's be honest, they stumble when it comes to the finer details. Think about generating text inside an image letters often look warped, words become unreadable, and for non Latin scripts like Chinese, its even messier.
Qwen Image built on a massive 20B parameter MMDiT architecture. Experiments show that it outperforms others in text rendering, especially in Chinese, while also excelling in general image generation, editing, and even image understanding tasks like object detection, segmentation, and depth estimation. On top of that, most editing tools are either too basic (crop, filter, resize) or too complex for everyday users.You can find more detailed findings into their research paper.
Now imagine an AI that not only generates stunning visuals in different
artistic styles but also integrates text with near perfect accuracy,
whether it's English or Chinese. Imagine editing at a professional level
swapping objects, tweaking styles, enhancing details all with natural,
intuitive inputs. These all are improved by this model. Lets see how to setup using ComfyUI.
Installation
1. Install ComfyUI into your system. If you already done, then just update if from the Manager by clicking on Update All option.
2. Download Qwen model FP8 (qwen_image_fp8_e4m3fn.safetensors) or Qwen Model FP16 from hugging face repository and save this into your ComfyUI/models/diffusion_models folder. Choose FP8 for 12GB VRAM or lower and for higher use FP16 variant.
3. Download text encoder FP8 (qwen_2.5_vl_7b_fp8_scaled.safetensors) or text encoder FP16. Choose the one that suits your hardware GPU. Then, put it into your ComfyUI/models/text_encoders folder.
4. Finally, download VAE (qwen_image_vae.safetensors) and save it to your ComfyUI/models/vae folder.
5. Restart ComfyUI to take effect.
Qwen Image GGUF
1. You can also download Qwen Image GGUF optimized variants by different developers that will consume low VRAMs. To use this model, you need to get install custom node ComfyUI-GGUF by City96 from the Manager. If you do not know whats GGUF models, learn GGUF in our quantized model tutorial.
If already installed, just update GGUF custom node by City 96 from the Manager by clicking Custom nodes Manager>Search for ComfyUI-GGUF custom nodes then click Install button.
(a) Qwen Image GGUF by City 96
(b) Qwen Image GGUF by Quantstack
Store this into ComfyUI/models/unet folder.
It ranges from Q2(faster with lower precision and lower quality) to Q8(slower generation with higher precision and high quality generation) .
Ex- 2-bit (Q2_K – ~7 GB), 3-bit (Q3_K – ~9 GB), 4-bit (Q4 – 12–13 GB), 5-bit (Q5 – 14–15 GB), 6-bit (Q6 – 16.8 GB), 8-bit (Q8 – 21.8 GB), 16-bit (BF16 – 40.9 GB)
2. Download text encoder FP8 (qwen_2.5_vl_7b_fp8_scaled.safetensors) or text encoder FP16. Choose the one that suits your hardware GPU. Then, put it into your ComfyUI/models/text_encoders folder.
3. Finally, download VAE (qwen_image_vae.safetensors) and save it to your ComfyUI/models/vae folder.
Workflow
1. You can then load up or drag the following image in ComfyUI to get the workflow, access from our Hugging Face repository. (Qwen_Image_basic_workflow.png ).
You can also get the workflow from ComfyUI dashboard. Navigate to top left corner Workflow>Browse Template>Image. Then search for Qwen Image Workflow. If you are not getting this means you are using the older ComfyUI. Update it from the Manager tab.
2. Drag and Drop into ComfyUI.
3. Load Qwen Image model into Load Diffusion node.
4. Input your prompt into prompt box.
KSAmpler settings:
Step:20 (Official-50) (For fast generation use - 10)
CFG: 2.5
Sampler: Euler
5. Click run to start execution.
We have tested to generate a model with professional photography shot.
Prompt: Professional studio portrait of a female fashion model wearing a sleek black evening gown, dramatic softbox lighting, high-fashion editorial look, ultra-sharp details