You often get into high-quality video generation that has always felt like an exclusive one. Creators, researchers, and indie developers constantly struggle with slow inference, limited quality, and models that are simply too heavy for real world use. Professional level video generation has been there, but never truly accessible because the models that a community get actually supposed to be the prototype. Hunyuan video 1.5 released by Tencent Team can generate smooth motion, crisp details, natural aesthetics, and long-sequence consistency.
The model is tiny 8.3b parameter Diffusion Transformer forms the backbone, surprisingly outperforming many heavier open-source video models.
Its 3D causal VAE compresses spatial and temporal information (16x spatial and 4x temporal), making training and inference dramatically more efficient. The new Selective and Sliding Tile Attention (SSTA) intelligently prunes redundant spatiotemporal KV blocks. It keeps only what matters, speeding things up that gives you up to 1.87x faster 10second 720p video generation compared to FlashAttention3.
A super resolution network boosts outputs from base resolution to clean, sharp 1080p, fixing distortions and improving textures. Training follows a multi stage progressive pipeline, powered by the Muon optimizer, aligning aesthetics, coherence, and human preferences more effectively. You can get more insights by accessing their research paper.
Installation
1. Install ComfyUI if you haven't installed yet. Old user need to update ComfyUI from the Manager by selecting Update All.
2. Download Hunyuan Video 1.5 released by the community (choose any of them as per system resources and use cases):
(a) HunyuanVideo1.5 FP16 (all variants at one place) optimized by ComfyUI.
(b) HunyuanVideo 1.5 BF16 (all variants at one place)released officially by Tencent Team.
Save it into ComfyUI/models/diffusion_models folder.
(c) HunyuanVideo1.5 Image to Video 720p GGUF by jayn7
(d) HunyuanVideo1.5 Image to Video 720p distilled GGUF by jayn7
(e) HunyuanVideo1.5 Text to Video 720p GGUF by jayn7
(f) HunyuanVideo1.5 Image to Video 480p GGUF by jayn7
(g) HunyuanVideo1.5 Image to Video 480p distilled GGUF by jayn7
The models are available from Q4(faster inference with low quality) to Q8 (generates high quality with slow inference).
For GGUF, Save it into ComfyUI/models/unet folder. Make sure you have ComfyUI-GGUF custom node by City 96. If not yet done, just install from Manager by selecting Custom Nodes Manager option. Update it if already using this.
If you do not know what is FP8/BF16/GGUF model variants, just follow our quantization tutorial to get more in depth overview.
3. Download text encoders (byt5_small_glyphxl_fp16.safetensors and qwen_2.5_vl_7b_fp8_scaled.safetensors / qwen_2.5_vl_7b.safetensors). Choose any one of them that suits your system resources. Save this into ComfyUI/models/text_encoders folder.
4. Download VAE (hunyuanvideo15_vae_fp16.safetensors) and save this into ComfyUI/models/vae folder.
5. Download Clip vision (sigclip_vision_patch14_384.safetensors ) and save this into ComfyUI/models/clip_vision folder.
6. Download LightX2V Lora model (hunyuanvideo1.5_t2v_480p_lightx2v_4step_lora_rank_32_bf16.safetensors) and save this into ComfyUI/models/loras folder. If using this, use the base models (bf16/fp16). Do not use CFG distilled models as these can generate low quality output.
7. Download Latent upscale models and save this into the ComfyUI/models/latent_upscale_models folder. These are the models to upscale your low quality videos to either 720p or 1080p.
8. Restart and refresh ComfyUI to take effect.
Workflow
1. Download the workflows from our Hugging Face repository.
(a) Hunyuan_Video_1.5_720p_t2v.json (FP16 720p Text to Video workflow)
(b) Hunyuan_video_1.5_720p_i2v.json (FP16 720p Image to Video workflow)
2. Drag and drop into ComfyUI.
(a) Load HuynyuanVideo1.5 model on load diffusion model node. If using GGUF then replace this node with the unet loader node.
(b) Load text encoders, vae into their respective node.
(c) Add positive and negative prompts into prompt box.
(d) Set KSampler Settings for specific model and with different video resolution provided below-
| Model | CFG Scale | Embedded CFG Scale | Flow Shift | Inference Steps |
|---|---|---|---|---|
| 480p T2V | 6 | None | 5 | 50 |
| 480p I2V | 6 | None | 5 | 50 |
| 720p T2V | 6 | None | 9 | 50 |
| 720p I2V | 6 | None | 7 | 50 |
| 480p T2V CFG Distilled | 1 | None | 5 | 50 |
| 480p I2V CFG Distilled | 1 | None | 5 | 50 |
| 720p T2V CFG Distilled | 1 | None | 9 | 50 |
| 720p I2V CFG Distilled | 1 | None | 7 | 50 |
| 720p T2V CFG Distilled Sparse | 1 | None | 9 | 50 |
| 720p I2V CFG Distilled Sparse | 1 | None | 7 | 50 |
| 480→720 SR Step Distilled | 1 | None | 2 | 6 |
| 720→1080 SR Step Distilled | 1 | None | 2 | 8 |
(e) Hit run to start the execution.
![]() |
| Hunyuan Video 1.5 output |










