Most AI video generation models like Sora, Kling, and CogVideoX do well with real-world videos. But they struggle with animation. Because animation has its own rules like exaggerated motion, unique art styles, and unrealistic physics. And judging the quality of animated videos is hard. There are not many solid benchmarks for it.
AniSora released by Bilibili, an open-source (Image To Video) anime video generation model registered under Apache2.0 license. It supports one-click video generation in various anime styles like series, manga, VTuber content, anime PVs, and more. The latest version, AniSora V2, is more stable, faster, and supports better video quality.
![]() |
Reference- Official Indexsora Page |
The model is built on a strong framework data pipeline with 10 million+ high-quality animation samples. It uses a spatiotemporal mask module for better motion and frame consistency. They tested the model on 948 animation video clips, grouped by actions.
Prompts were generated by Qwen-VL2 and manually corrected. Human evaluations showed that AniSora delivers consistent characters and smooth motion. You can find more in depth information into their research paper.
Prompt : The scene depicts an exploding rock, erupting in blinding light as shattered fragments blast outward in all directions.
Prompt: The scene shows two figures in red wedding clothes holding a red rope as they walk off into the distance
Here is a simple step by step guide to download and set up the Wan2.1 Anisora Image-to-Video workflow using the optimized repacked safetensors and GGUF formats for ComfyUI.