HunyuanCustom: Subject driven Consistent VideoGen

Now, controlling your subject in video generation is much easier than before. HunyuanCustom , the latest multi-modal video generation model from the Tencent team that takes controllable video generation to the next level.

It uses subject and object as the reference and generate focused driven video. You can use single , or multi subject to do this. The model is also capable of generate AI videos by inputting your reference audio with text conditioning or AI object editing videos.

Reference- HunyuanCustom Official Page

Wheather you want create your own AI influencer, E-commerce related showcasing, or doing your Promtional content, this is the mind blowing approach. This methodology is highly effective when you want to do some kind of story generation where you want high level of consistency. You can understand more into their research paper.

🚀 Introducing HunyuanCustom: An open-source, multimodal-driven architecture for customized video generation, powered by HunyuanVideo-13B. Outperforming existing open-source models, it rivals top closed-source solutions! 🎥

Highlights:
✅Subject Consistency: Maintains identity… pic.twitter.com/FwpJrHncdL
— Hunyuan (@TencentHunyuan) May 9, 2025

The model uses the base HunyuanVideo architecture. So, you will need to do the Hunyuan setup if you haven't done before. Lets, see how to do this.

Table of Contents:

Installation

1. New user need to install ComfyUI. Update it if you are an older user to avoid any future errors. You can also learn the beginner's guide to ComfyUI if do not know about it.

2. You need to install and setup HunyuanVideo Custom nodes by Kijai. The user already have this custom node need to only update it from the Manger section by using the search option from the "Custom nodes" option.

3. The official HunyuanCustom model Fp16 requires at least 80GB VRAM and fp8 variant requires at least 24GB VRAM.

But, lower VRAM users can use the HunyuanCustom model quantized by Kijai. After downloading the model save them into the "ComfyUI/models/diffusion_models" folder. The BF16 model variant is for users having more than 12GB VRAM, others having lower VRAM need to use the FP8 variant.

GGUF Variant

This is also an alternative to use quantized models when you are having lower end GPUs.

(a) Install and Setup GGUF custom node by City96. If you already installed then just update it from the Manager or use do manual update by running git pull command inside GGUF custom node folder.

(b) Download GGUF HunyuanCustom model and put it into "ComfyUI/models/unet" folder.

4. Now download the VAE, Text encoders and clip vision models from Hugging Face repository. If you already downloaded these for HunyuanVideo model earlier, then its not required.

-VAE - save this into "ComfyUI/models/vae" folder.

-Text Encoder - save this into "ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer" folder.

-Clip Vision - save this into "ComfyUI/models/clip/clip-vit-large-patch14" folder.

If you do not have these folder, just create it then save them into respective folder.

5. Restart you ComfyUI and refresh it.

Workflow

1. Get the workflow from your "ComfyUI/custom_nodes/ComfyUI-HunyuanVideoWrapper/examples" folder.

2. Drag and Drop into ComfyUI.

(a) Load HunyuanCustom Model. People who are using GGUF model, you need to replace the node with Unet loader node.

(b) Load VAE model.

3. Load you image. If your image is not that perfect resolution, use these recommended values (provided on their official page):

(a) 720 by 1080 with 129 frames.

(b) 512 by 896 with 129 frames

Note- Take it in mind- more the size of your resolution, higher your VRAM utilization will be.

4. Add your prompt. We used this to make the model understand better

Prompt: a girl dancing in college farewell on the stage.

4. Click the Run option. The result you are seeing is not cherry picked. We are just presenting what we got at our first attempt.

The model is highly relative to your subject image and generate whatever you have inputted as the referenced prompts and images. But, you do not get the perfect generation at the first attempt. But, somewhat better than many other video generation models that hallucinate more with morphing video frames.

HunyuanCustom: Subject driven Consistent VideoGen

Installation

Workflow

Posted by Admin

Search This Blog

Trending

Wan 2.1: Install & Generate Videos locally with lower VRAM

Easy Install ComfyUI Portable (Windows/Mac/Linux)

Wan2.1 FusionX 14B: Consistent Fast VideoGen with Low VRAM

19 Attractive Prompts for Cool Selfie Clicks

Run Stable Diffusion 10x faster on AMD GPUs

Installing Stable Diffusion 3.5 Locally

Our Social Pages

Recent Posts

Important pages

Contact form

HunyuanCustom: Subject driven Consistent VideoGen

Installation

Workflow

Posted by Admin

Related Posts

Search This Blog

Trending

Wan 2.1: Install & Generate Videos locally with lower VRAM

Easy Install ComfyUI Portable (Windows/Mac/Linux)

Wan2.1 FusionX 14B: Consistent Fast VideoGen with Low VRAM

19 Attractive Prompts for Cool Selfie Clicks

Run Stable Diffusion 10x faster on AMD GPUs

Installing Stable Diffusion 3.5 Locally

Our Social Community

Our Social Pages

Recent Posts

Important pages

Contact form