Hunyuan Image 2.1: High quality Pixelated Image generation

 

setup Hunyuan Image 2.1

Text to image models have always struggled with balancing quality, speed, and accuracy. You might get a sharp image but lose alignment with the text. You might achieve alignment but at the cost of clarity or high resolution. These gaps often leave creators spending extra time refining prompts or editing outputs to achieve their vision. HunyuanImage 2.1 brings a fresh solution that directly addresses these issues.

 

hunyuan image 2.1 model showcase
Hunyuan image 2.1 model showcase (Ref-official page)

 

The Tencent team designed the model around two powerful stages. The base text to image model relies on two text encoders. A multimodal large language model enhances text image alignment. A character aware encoder ensures better text rendering across languages. The backbone is a diffusion transformer with 17 billion parameters, tuned with reinforcement learning from human feedback for aesthetics and structural balance. 

On top of this, a refiner model polishes images to reduce artifacts and sharpen quality. A PromptEnhancer module strengthens prompts before inference. Meanflow distillation cuts down computational needs for faster outputs. 

Structured captions play a big role by layering semantics across short to extra long forms. An OCR agent and IP RAG fill in the gaps where general captioners fail, while a bidirectional verification process ensures caption accuracy.

 

HunyuanImage 2.1 steps up with features that make it stand apart. It produces high quality 2K images with rich composition. It natively supports both Chinese and English prompts. The architecture is advanced with a single and dual stream diffusion transformer design. 

ByT5 powers glyph aware processing which results in more accurate text in images. The model supports flexible aspect ratios such as 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, and 2:3. Prompt rewriting through the PromptEnhancer improves descriptive depth and boosts visual output.


 

Installation

1. Install and setup ComfyUI if you are new user. Older user have to update it from the Manager by selecting Update All option.


2. Download any of the Hunyuan Image 2.1 (BF16/FP8/BF16 distilled) and save this into your ComfyUI/models/diffusion_models folder. You need at least 24 GB VRAM as officially recommended by them.

Download hunyuan image 2.1

Users can also use Hunyuan 2.1 GGUF variants having low VRAMs can use the Hunyuan 2.1 GGUF and Hunyuan 2.1 Distilled GGUF models. Save it into ComfyUI/models/unet folder.


3.Download Text Encoders Byt5 Small Fp16 (byt5_small_glyphxl_fp16.safetensors) and Qwen 2.5 VL (qwen_2.5_vl_7b.safetensors) ,  save this into your ComfyUI/models/text_encoders folder. The model is capable of using prompt rewriting technique so that your prompt get enhanced in the background. 


4. Download VAE (hunyuan_image_2.1_vae_fp16.safetensors) and save it into your ComfyUI/models/vae folder.


5. These are optional for refining and enhance your image that will also help in reducing the artifacts. Download Hunyuan Image Refiner FP16 (hunyuan_image_refiner_vae_fp16.safetensors) / Hunyuan Image Refiner BF16 (hunyuanimage2.1_refiner_bf16.safetensors) . Use Hunyuan Image Refiner GGUF if using GGUF variant.


 

Workflow

 1. Download the workflow (Hunyuan_Image_2.1_workflow.png) from our Hugging face repository

2. Drag and drop into ComfyUI.

 

Hunyuan_Image_2.1_workflow
Hunyuan Image 2.1 workflow

 

 (a) Load Hunyuan Image 2.1 model into Unet model loader/ Load diffusion model loader node.

(b) Load Qwen and Byt5 small (text encoders) into Dual Clip loader node.
 
(c) Load VAE model into VAE Decode node.
 
(d) Add prompts into prompt box. You donot need to use the longer detailed prompting as it uses the prompt enhancer model that will going to enhance your prompt into more descriptive style.
 
(e) Set KSampler officially recommended settings:
 
For normal use (High quality generation):
CFG- 3.5
 
Steps- 20-50 
 
Sampler- Euler 
 
Shift-5 
 
For Distilled use(faster inference):
CFG- 1
 
Steps-8
 
Sampler- Euler 

Shift -4 

  

Hunyaun Image 2.1 testing

Prompt: 
Low-angle shot of a fashionable woman with long wavy dark hair, wearing glasses and a stylish outfit in  sheer black mesh sleeves, fitted light top, maroon skirt, fishnet tights. She is posing with a slight tilt, giving a confident yet introspective gaze. The setting is a moody indoor room with natural backlight from tall window panes, messy desk with scattered items in the background, soft cinematic lighting, slight grain, and vintage tones. Light leak on the edge of the image, photojournalistic vibe, realistic shadows, DSLR photo look, aesthetic bedroom setup.

CFG-3.5
Resolution-2048 by 2048
Step-50