Ernie Image Base & Turbo (BF16/FP8/GGUF)-Better Text Generation

Most text to image models look impressive at first, but once you try using them for actual work, cracks start to show. Most models focus heavily on visual appeal, but not enough on control, structure, and accuracy which are essential for real world use cases like marketing creatives, comics, or UI mockups. ERNIE-Image Base and ERNIE-Image Turbo developed by Baidu's team released under Apache2.0 license, this model is powered by a single-stream Diffusion Transformer (DiT) and enhanced with a smart Prompt Enhancer.

It runs on 8 billion parameters, which is relatively compact. Still, it achieves state-of-the-art performance among open-weight models. The Prompt Enhancer takes simple inputs and expands them into detailed, structured prompts, improving output quality without extra effort from users. So instead of relying on perfectly crafted prompts, the model does a lot of the heavy lifting for you.

Ernie Image model showcase

ERNIE Image Base does not just improve one aspect but it tackles multiple real world challenges- Compact but Powerful, Wide Style Coverage, Structured Image Generation, Easy Deployment, Accurate Text Rendering and Reliable Instruction Following.

Whereas, Ernie Image Turbo is developed as a distilled version of Ernie Image base, meaning its optimized for speed while preserving core capabilities. It’s built on the same single stream Diffusion Transformer (DiT) architecture but is designed to generate results in just 8 inference steps, which is significantly faster than traditional diffusion models.

Ernie Image Turbo directly tackles the speed vs. quality dilemma with a well balanced approach with features like Fast and Efficient, Accurate Text Rendering, Accessible Deployment, Wide Style Coverage, Structured Image Generation and Reliable Instruction Following.

Installation

1. Make sure to do the ComfyUI installation. Older user need to update Comfyui from Manager by selecting Update all option.

2. Download any of Ernie Image(Turbo/Base) model from its hugging face repository: The VRAM needs to be atleast 12GB to work well.

(a) Ernie Base (ernie-image.safetensors) - for High quality generation without any quality degradation
(b) Ernie Turbo (ernie-image-turbo.safetensors)- for Fast image generation

Save this into ComfyUI/models/diffusion_models folder.

(c) Ernie Image Base GGUF and Ernie Image turbo GGUF by unsloth. If using GGUF variant, then save this into ComfyUI/models/unet folder.

3. Download VAE (flux2-vae.safetensors) from hugging face repository. This is the same vae that has been used in older workflows. If you already have, not needed to download again.

Save this into ComfyUI/models/vae folder.

4. Download text encoder (ernie-image-prompt-enhancer.safetensors & ministral-3-3b.safetensors).

Save both of them into ComfyUI/models/text_encoders folder.

5. Restart and refresh ComfyUI.

Workflow

1. Download the workflows for Base and Turbo model from our Hugging face repository. Alternatively, it can be found in ComfyUI workflows template section.

(a) Ernie_image.json

(b) Ernie_image_turbo.json

If using the GGUF model, replace the load diffusion model node with unet loader node.

2. Drag and drop into ComfyUI.

3. Load Ernie image(Base/turbo) model using load diffusion model loader node.

4. Load text encoders, vae into their respective nodes. Add prompts into prompt box.

Set the KSampler Settings-

Base variant-
Steps-50
CFG-4.0

Turbo variant-
Steps-8
CFG-1.0

Supported Resolution- 1024x1024, 848x1264, 1264x848, 768x1376, 896x1200, 1376x768, 1200x896

5. Hit run and to start generation.

textual generation with red bull energy drink

We have noticed ERNIE image model tends to produce gibberish if your prompt is not clear. But when you are specific especially with text, it usually gets it right. It handles anatomy really well and can deliver some impressively realistic results.

On the plus side, it is incredibly fast and accurate. It understands concepts well, handles text (as long as you provide it clearly), and works across styles like anime, comics, and realism. The lighting feels cinematic and volumetric, and thankfully, you donot get that plastic looking skin.

On the downside, you cannot rely on it to generate random text, you need to supply the exact lines. That is not a big issue since we use an LLM to help craft prompts anyway. Occasionally, you will still spot minor inconsistencies, but it happens far less compared to models like Klein9b or Z Image Turbo.

Ernie Image Base & Turbo (BF16/FP8/GGUF)-Better Text Generation

Installation

Workflow

Posted by Administrator

Search This Blog

Popular Posts

Krea2 Raw/Base & Turbo (BF16/FP8/NVFP4/INT8) High Quality Image Gen

22 Top Krea2 LoRA models for Stylized Image Generation

Sulphur 2 -The Uncensored LTX2.3 Video Generation

TIPO and DanTagGen : Power your Image Prompting

Install Forge Neo WebUI- Better than Forge & Automatic1111

Flux.2 Klein 4b-9B(GGUF/FP8/BF16) Image Gen & Editing

Important Pages

Our Social Page

Recent Post

Contact form

Ernie Image Base & Turbo (BF16/FP8/GGUF)-Better Text Generation

Installation

Workflow

Posted by Administrator

Related Posts

Search This Blog

Our Social Community

Popular Posts

Krea2 Raw/Base & Turbo (BF16/FP8/NVFP4/INT8) High Quality Image Gen

22 Top Krea2 LoRA models for Stylized Image Generation

Sulphur 2 -The Uncensored LTX2.3 Video Generation

TIPO and DanTagGen : Power your Image Prompting

Install Forge Neo WebUI- Better than Forge & Automatic1111

Flux.2 Klein 4b-9B(GGUF/FP8/BF16) Image Gen & Editing

Important Pages

Our Social Page

Recent Post

Contact form