Ernie Image Base/Turbo- Better textual generation


Most text to image models look impressive at first, but once you try using them for actual work, cracks start to show. Most models focus heavily on visual appeal, but not enough on control, structure, and accuracy which are essential for real world use cases like marketing creatives, comics, or UI mockups. ERNIE-Image Base and ERNIE-Image Turbo developed by Baidu's team released under Apache2.0 license, this model is powered by a single-stream Diffusion Transformer (DiT) and enhanced with a smart Prompt Enhancer.

It runs on 8 billion parameters, which is relatively compact. Still, it achieves state-of-the-art performance among open-weight models.  The Prompt Enhancer takes simple inputs and expands them into detailed, structured prompts, improving output quality without extra effort from users. So instead of relying on perfectly crafted prompts, the model does a lot of the heavy lifting for you.

Ernie Image model showcase
Ernie Image model showcase

ERNIE Image Base does not just improve one aspect but it tackles multiple real world challenges- Compact but Powerful, Wide Style Coverage, Structured Image Generation, Easy Deployment, Accurate Text Rendering and Reliable Instruction Following. 

Whereas, Ernie Image Turbo is developed as a distilled version of Ernie Image base, meaning its optimized for speed while preserving core capabilities.  It’s built on the same single stream Diffusion Transformer (DiT) architecture but is designed to generate results in just 8 inference steps, which is significantly faster than traditional diffusion models.

Ernie Image Turbo directly tackles the speed vs. quality dilemma with a well balanced approach with features like Fast and Efficient, Accurate Text Rendering, Accessible Deployment, Wide Style Coverage, Structured Image Generation and Reliable Instruction Following.

 

Installation

1. Make sure you done the ComfyUI installation. Older user need to update Comfyui from Manager by selecting Update all option.

2. Download any of Ernie Image(Turbo/Base) model from its hugging face repository: The VRAM needs to be atleast 12GB to work well.

Ernie Image(Turbo/Base) model


(a) Ernie Base (ernie-image.safetensors) - for High quality generation without any quality degradation
(b) Ernie Turbo (ernie-image-turbo.safetensors)- for Fast image generation 

Save this into ComfyUI/models/diffusion_models folder.

(c) Ernie Image GGUF by unsloth. If using GGUF variant, then save this into ComfyUI/models/unet folder. 

3. Download VAE (flux2-vae.safetensors) from hugging face repository. This is the same vae that has been used in older workflows. If you already have, not needed to download again.

download vae 

Save this into ComfyUI/models/vae folder.

4. Download text encoder (ernie-image-prompt-enhancer.safetensors & ministral-3-3b.safetensors). 

 Download text encoder

 Save both of them into ComfyUI/models/text_encoders folder.

5. Restart and refresh ComfyUI.

 

Workflow

1. Download the workflows for Base and Turbo model from our Hugging face repository. Alternatively, it can be found in ComfyUI workflows template section.

(a) Ernie_image.json

(b) Ernie_image_turbo.json

If using the GGU model, replace the load diffusion model node with unet loader node.

2. Drag and drop into ComfyUI.

3. Load Ernie image(Base/turbo) model using load diffusion model loader node. 

4. Load text encoders, vae into their respective nodes. Add prompts into prompt box.

Set the KSampler Settings-

Base variant-
Steps-50
CFG-4.0

Turbo variant-
Steps-8
CFG-1.0 

Supported Resolution- 1024x1024, 848x1264, 1264x848, 768x1376, 896x1200, 1376x768, 1200x896

5. Hit generate to start generation.


 

ernie image base vs turbo


ernie image base vs turbo


ernie image base vs turbo