LongCat - Infinite Long Video Generation

  setup long cat model in comfyui

 Despite major progress in generative video models, a few issues still hold them back such aslong videos often fall apart color drifting, flickering, and quality loss over time. Different tasks require different models like one for text to video, another for image to video, and another for continuation. High-resolution video generation is slow, making real workflows inefficient. LongCat is single, unified model that handles all major video generation tasks while producing longer, higher-quality videos, faster. This removes one of the biggest limitations in current video-generation systems.


longCat working principle
LongCat Working Principle



LongCat-Video stands on several strong research contributions:

-Unified Video Generation Architecture - One model natively supports text-to-video, image-to-video, and video continuation.

-Pretraining on Video-Continuation -This is what makes long, stable video generation possible.

-Coarse-to-Fine Generation Strategy-It refines frames across both time and space, making inference far more efficient.

-Block Sparse Attention-A major performance boost when generating high-resolution videos.

-Multi-Reward RLHF (GRPO)-Multiple reward signals improve coherence, realism, and alignment across tasks. 



LongCat model Showcase
LongCat model Showcase



From a research standpoint, it is a balanced combination of architecture innovation and advanced reinforcement learning. You can get the depth information by accessing their research paper.


 

Installation

Update comfyui from the manager

1. Setup ComfyUI if not yet. If already installed, update it from the Manager by selecting Update All.

2. Make sure you have Kijais custom node Wan Video wrapper installed. If already have then update the custom nodes from the Manager.

download long cat models and loras


3. Download LongCat models from Kijai's hugging face repository:
(a) LongCat model fp8 (LongCat_TI2V_comfy_fp8_e4m3fn_scaled_KJ.safetensors), for 12 to 16 GB VRAM.

(b) Long Cat BF16 (LongCat_TI2V_comfy_bf16.safetensors), for higher VRAM 24 GB or more with better output.

Download any of them as per your system resources, save it inside ComfyUI/models/diffusion_models folder.

4. Download lora models (any of them):

(a) Long Cat BF16 lora (LongCat_refinement_lora_rank128_bf16.safetensors), save this inside ComfyUI/models/loras folder. This needs only 12 steps.

(b) LongCat distill lora alpha(LongCat_distill_lora_alpha64_bf16.safetensors), save this inside ComfyUI/models/loras folder.

6. Restart and Refresh ComfyUI.


Workflow

1. Get the workflow (LongCat_TI2V_example_01.json) inside your ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/example_workflows folder.

 

LongCat_TI2V workflow


2. Drag and drop into ComfyUI. You will get missing red error nodes. Just install them from Manager. The workflow will be based on Wan 2.2 5b TI2V framework. So all the basic models(wan 5b TI2V model, wan2.1vae, umt5-xxl etc ) will be same. Load them all into its relative node.

3. Execute your workflow by setting up :
(a) Load your image into Load image node.

(b) Load Lora model into WanVideo Lora Select node.

(c) Load wan 2.1vae into Wanvideo vae loader.

(d) Load LongCat model into Wan Video Model Loader node. Select your attention generation model-flash att, sdpa, sage_atten etc.

(e) Add your detailed positive and negative prompts into prompt box.

(f) Hit run to execute the workflow. You can use the WanVideoblockSwap node to control the generation with block swapping techniques if having low VRAMs.

CFG-1
Shift-12
Inside wanVideo scheduler node, Scheduler- LongCat distill Euler


4. (a) Use infinite workflow for longer videos generation. You can extend it further for even longer videos but also takes longer inference time.

 

5 groups of same bunch of nodes

(b) There are total 5 groups of same bunch of nodes in the workflow. 

 

detach the nodes

To do the extension, move to the last group, then detach the nodes ImageBatchExtendWithOverlap and GetImageSizeAndCount

(c) Select and copy any of the group (as they are same), paste there. Your new group as extension is ready. 

Add new Reroute node 

(d) Add new Reroute node, connect the  ImageBatchExtendWithOverlap of older group to new Reroute node  and then to ImageBatchExtendWithOverlap node of new group.

 

attach the nodes 

(e) Finally, connect the ImageBatchExtendWithOverlap of new last group and GetImageSizeAndCount node. Then just repeat the same if you want to generate further longer videos.