If you have worked with pose driven character animation or image pose transfer, you already know the biggest headache is everything depends on perfectly aligned reference pose pairs. The reference image must match the pose skeleton, the layout must fit, and even small misalignments break the generation quality.
![]() |
| One-to-All Animation model working |
In real projects, references come in all shapes and sizes different camera angles, zoom levels, incomplete bodies, random crops. But current methods get confused, misinterpret the identity, and sometimes completely distort the character.
And let's not even talk about long, coherent video generation most models struggle to maintain identity and consistency beyond a few seconds.
![]() |
| One-to-All Animation model architecture (Ref-research paper) |
One-to-All Animation model from Jiangnan University, USTC, CAS, BUPT, and Zhejiang University tries to fix these. The researchers noticed a recurring flaw in previous approaches -all of them depend on spatial alignment between reference images and target poses. More detailed insights can be found by accessing their research paper.
When alignment fails, identity fidelity drops and pose overfitting kicks in. And they approached the problem with three fresh ideas-Self-Supervised Out-Painting, Reference Extractor and Hybrid Reference Fusion which gives the model to generalizes across layouts, poses, and video lengths better than anything before it.
Installation
1. First, install ComfyUI if not yet. If already installed, update it from the Manager by selecting Update All option.
2. Make sure you have Kijai's custom node Wan Video wrapper installed. Next you also need Kijai's ComfyUI-WanAnimatePreprocess custom nodes installed. If already have, just update the custom nodes from the Manager.
3. Download One to All Animation models from Kijai's hugging face repository. Choose the one that suits your system resources:
(a) One to All Animation FP8 (Wan21-OneToAllAnimation_fp8_e4m3fn_scaled_KJ.safetensors), for 12 to 16 GB VRAM for faster inference.
(b) One to All Animation FP16 (Wan21-OneToAllAnimation_fp16.safetensors), for 24 GB VRAM or more for better output.
Save it inside your ComfyUI/models/diffusion_models folder.
4. Download YOLOv10m model (yolov10m.onnx).
Then, Download DWwhole Body pose (vitpose-l-wholebody.onnx ) model.
Save both of them into your ComfyUI/models/detection folder. If you do not have, then just create it.
5. Restart and Refresh ComfyUI.
Workflow
1. You can get the workflow (Wan21_OneToAllAnimation_example_01.json) inside your ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/example_workflows folder.
2. Drag and drop into ComfyUI.
3.If you get missing red error nodes, just install them from Manager by selecting Install missing nodes option.
4. Run the workflow by setting up the nodes:
(a) Load your image into Load image Reference node. Then Load your reference video into Load video node. Use 480p for lower VRAMs. Higher VRAMs user can go up to 720p.
(b) Load One To All animation model (Fp16 or FP8 version) into WanVideo Model Loader node.
Then into the ONNX detection Model Loader node, load both (YOLOv10m and DWwhole Body pose) models. These are responsible for handling body pose from the video.
(c) Load wan 2.1 Model into model loader node.
(d) Load wan 2.1vae, text encoders into their respective nodes.
(e) Add your detailed long positive and negative prompts into Wan Video Text Encode node. Make sure you are describing what you want. We experienced low quality and shorter prompts often generate weird generation. Using any LLMs (QwenVL/GPT/Gemini based), you can add prompt enhancer technique to make your short prompts into more detailed prompts.
(f) Hit run to execute the workflow.
Set values from WanVideo Scheduler node:
Scheduler-Euler
Steps-6
Shift-7.0
The workflow has extension features (currently having same 3 sections) so that you can make even longer videos.
Just copy any of section and paste any where. Then connect the Extend node of last section to the Extend node of the new section. Now, doing and repeating this further you can go as longer as you want for your long video generation.
The One-to-All Animation model delivers noticeably smoother results, especially in facial consistency that an area where Wan Animate often struggles. However, it still has a significant limitation and that is basically the background motion.
If you generate videos that include dynamic background effects such as panning or zooming, the model tends to keep the background static. This creates an unnatural, frozen environment even when the original scene has motion.
Although the authors claim in their research paper that One-to-All outperforms Wan Animate, this is only partially true. In some aspects, particularly background dynamics, the model does not match Wan Animate's performance. The results suggest that the model may have been released prematurely, without sufficient training dataset to fully handle complex background movements.









