Wan2.1 Anisora: Generate Anime Videos with less Sampling Steps

download wan2.1 anisora comfyui

Most AI video generation models like Sora, Kling, and CogVideoX do well with real-world videos. But they struggle with animation. Because animation has its own rules like exaggerated motion, unique art styles, and unrealistic physics. And judging the quality of animated videos is hard. There are not many solid benchmarks for it.

AniSora released by Bilibili, an open-source (Image To Video) anime video generation model registered under Apache2.0 license. It supports one-click video generation in various anime styles like series, manga, VTuber content, anime PVs, and more. The latest version, AniSora V2, is more stable, faster, and supports better video quality.


Index Sora framework
Reference- Official Indexsora Page

The model is built on a strong framework data pipeline with 10 million+ high-quality animation samples. It uses a spatiotemporal mask module for better motion and frame consistency. They tested the model on 948 animation video clips, grouped by actions. 

Prompts were generated by Qwen-VL2 and manually corrected. Human evaluations showed that AniSora delivers consistent characters and smooth motion. You can find more in depth information into their research paper.


Prompt: The figures in the picture are sitting in a forward moving car waving to the rear, their hair swaying from side to side in the wind




Prompt : The scene depicts an exploding rock, erupting in blinding light as shattered fragments blast outward in all directions.


Prompt: The scene shows two figures in red wedding clothes holding a red rope as they walk off into the distance


Prompt: The old man's gaze locks onto the gemstone, his right hand subtly adjusting the magnifying glass as his lips move as if it holds the key to unraveling some ancient knowledge or secret.

Here is a simple step by step guide to download and set up the Wan2.1 Anisora Image-to-Video workflow using the optimized repacked safetensors and GGUF formats for ComfyUI.


Installation

1. Install ComfyUI if you are new user. Older user should update ComfyUI from Manager by selecting "Update All".

2. Download and Setup the basic WAN 2.1 Image To Video / GGUF Wan2.1 IMG to Video workflow.

3. Download and setup (Index-Anisora) from official Hugging face repo. The files are very large and this is time consuming to setup. 

You can also download the optimized version repacked to safetensors by Kijai.

Download anisora Img2Vid fp8 variant

(a) Download Wan2.1 I2V Anisora FP8 variant (Wan2_1-Anisora-I2V-480P-14B_fp8_e4m3fn.safetensors) for users with and less than 12GB VRAMs.



Download anisora Img2Vid fp16 variant



(b) Download Wan2.1 I2V Anisora Fp16 (Wan2_1-Anisora-I2V-480P-14B_fp16.safetensors) for users with more than 12GB VRAMs.

Save it inside your ComfyUI/models/diffusion_models folder. Rest Clip, text encoders are already used by basic WAN I2V workflows. So, downloading again is not required. 


download wan Anisora GGUF

(c) This is the Anisora GGUF variant (another option) for lower VRAM users. Download any Wan2.1-Anisora-14B-I2V-480P-GGUF from Hugging Face repository and save it inside your ComfyUI/models/unet folder. You can learn more about GGUF models in quantized model tutorial
Rest models like Clip, text encoders are already used by basic WAN I2V workflows. So, downloading is not required. 


4. Restart ComfyUI.


Workflow


1. Load or create the native WAN I2V animation workflow from our Hugging Face repository
 
2. Choose the Wan Anisora Img2V fp16/fp8/GGUF diffusion model (based on what you downloaded).

3. Replace load diffusion node with GGUF unet in the node if using the GGUF model variant. 

4. Input your image and prompt. Run and Get video output.