HiDream O1 Image Dev & Base (Fp8/BF16/GGUF)- Better Text generation

 Modern image generation models have become incredibly powerful, but they still come with several limitations behind the scenes. Most systems rely on multiple separate components working together such as external VAEs, independent text encoders, and task-specific pipelines. While this setup works, it often creates inefficiencies and inconsistencies during image generation.

HiDream-O1-Image aims to solve these challenges with a fully unified approach to image generation.  Instead of combining disconnected systems, the model is built around a Pixel level Unified Transformer (UiT) that directly processes raw pixels, text, and task-specific conditions within a single shared token space. This creates a more streamlined and native generation pipeline.

Hidream o1 showcase
Hidream o1 showcase

Whether it’s text-to-image creation, image editing, subject-driven personalization, storyboard generation, or long text rendering, HiDream-O1-Image is designed to manage everything within one unified architecture.

textual representation
textual representation

 

The research behind HiDream-O1-Image focuses on simplifying and strengthening the foundation of image generation models.  Traditional diffusion systems typically depend on external Variational Autoencoders (VAEs) to compress images into latent spaces before generation. They also rely on separate text encoders to process prompts independently. You can find indepth information by accessing their research paper.

Id preservation
Id preservation

 

While effective, this fragmented design can limit coherence between text understanding and image synthesis.  HiDream-O1-Image removes these separations entirely. Its Pixel-level Unified Transformer directly encodes raw pixels alongside textual and conditional information into a shared representation space. This unified structure allows the model to better align visual understanding with language comprehension.

Installation


1. Make sure you do the ComfyUI installation. Older user need to update it from the Manager itself.

2. Download HiDream o1 models from Officially repacked by ComfyUI:

There are multiple variants to choose from. Use that's suitable for you system resources:



hidream O1 image base and dev models


(a) hidream_o1_image_bf16.safetensors
(b) hidream_o1_image_dev_bf16.safetensors
(c) hidream_o1_image_dev_fp8_scaled.safetensors
(d) hidream_o1_image_dev_mxfp8.safetensors
(e) hidream_o1_image_fp8_scaled.safetensors
(f) hidream_o1_image_mxfp8.safetensors

BF16 is for high quality generation but consumes more memory. FP8 (float bit 8) uses low VRAM but quality degradation will be there. MXfp8 is the hardware level support (on RTX 4090 and 5090) latest release by NVIDIA (blackwell) for better quality and faster generation.

Here, the hidream_o1_image is the base variant and hidream_o1_image dev is the distilled variant supports text to image and image to image. Save this into ComfyUI/models/checkpoints folder.

3. Download text encoder (gemma4_e4b_it_fp8_scaled.safetensors) and save this into ComfyUI/models/text_enocders folder.

 

download text encoder

This model is the unified transfer model. There are no VAE.

4. Restart and refresh ComfyUI.



Workflow

1. Download the workflows from our Hugging face repository.

(a)HiDream O1 base (Hidream_O1_base.json)
(b)HiDream O1 Dev (Hidream_O1_dev.json)

2. Drag and drop into ComfyUI

3. Load the HiDream O1 model into load checkpoint node and text encoder into its relevant node. 

4. Add text prompts into prompt box. 

5. Set KSampler Settings-

HiDream o1 Image

Steps- 50 

CFG-5.0 

 HiDream o1 Image Dev

 Steps-28

CFG-5.0 

 6. Hit run to start generation.

 

Some results using HiDream O1 Image base and Dev model:

 Test 1(Short text)

HiDream O1 Image dev

HiDream O1 Image

Prompt- A realistic airport departure board inside a crowded international terminal with travelers walking around and luggage carts moving nearby. The digital board contains multiple rows of perfectly aligned text including destinations, times, and status messages. One highlighted row clearly reads: “Flight AI-302 — DELAYED”. The typography should remain sharp, aligned, and fully readable even with multiple lines of information. 

 

 Test 2(Long text)

 

hidream o1 image

hidream o1 image dev

Prompt-  A highly detailed close-up of a modern smartphone displaying a messaging app conversation in dark mode. One visible long message bubble contains the exact text: “Hey, I might be late for the meeting because traffic near downtown is completely blocked right now. Please start without me if necessary, and I’ll join as soon as I can. Also don’t forget to bring the presentation files.” The text should appear naturally inside the UI with realistic spacing, emojis, timestamps, and authentic smartphone typography.

 

 Test 3 (Art with realism)

 

Hidream o1 image dev

Hidream o1 image

Prompt-  Ultra-photorealistic close-up portrait of a natural 20-year-old woman standing beside a window during golden hour, soft sunlight illuminating realistic skin pores, peach fuzz, subtle freckles, detailed irises, slightly messy hair strands, natural lips without excessive makeup, shallow depth of field, cinematic photography, DSLR realism, authentic imperfections, realistic shadows, high dynamic range.

 

Test 4 (Closeup shot)

HiDreamO1 Image

HiDreamO1 Image dev

 Prompt- Ultra-macro close-up photograph of a single glowing firefly resting on a dew-covered leaf at night, extremely detailed translucent wings, realistic glowing abdomen emitting soft bioluminescent light, visible micro textures on the insect body, shallow depth of field, cinematic nature photography, dark forest background with subtle bokeh.