Wan 2.2 Animate: Consistent Video to Video Pose Transfer

 

Using Wan 2.2 Animate you can easily replicate facial expressions, body gesture to your end video. Well, we already explained the alternative way to transfer pose with Wan 2.2 Vace Fun Control tutorial, but it need good VRAM with high noise and low noise models. Right now, we will see how we can do pose transfer with Wan 2.2 Animate model.


Installation


1. Make sure you have updated your ComfyUI to the latest version. If not yet, just update it from the Manager by clicking Update All.

2. Download the Wan 2.2 Animate Model (FP16/Fp8/GGUF), relight lora model, lightx2v lora I2V model,  text encoders, clip model and vae. Choose the one that suits your system resources. All the details can be found in our WAN 2.2 installation tutorial.

3. Then, download the Wan 2.2 animate workflow from our Hugging face repository and Setup the workflow into ComfyUI. Alternatively, you can get the workflow from ComfyUI from the workflow > Template section. 

Its to take it into mind, we are showcasing the native support Wan 2.2 animate workflow released officially by ComfyUI and not the Kijai's Wan 2.2 animate workflow.


Running the workflow


1. Load the image and reference video to transfer the pose.

2. Load models (Wan2.2 Animate, loras, text encoders, vae,clips etc) into their respective nodes.

3. Add the prompts into prompt box.

4. Put red and green dots for better pose estimation.

5. Hit run to execute the workflow.



Workflow Node Explanation

1. After downloading, drag and drop workflow into ComfyUI. Finally, restart and refresh ComfyUI.

If you get missing errors nodes message, install the missing nodes from the Manager by selecting Install Missing Custom Nodes option. Then, select all to install them from the list at once. Then, restart and refresh ComfyUI to take effect.

missing models error


Users getting missing models error message can download them by clicking into its relative links from their respective repositories.

2. Now, after opening the workflow, follow the step by step process-

Video Resolution


(a) Video Resolution- Set the video resolution in width and height. More will give you good output with consistency but also requires more VRAM and inference time. So choose this wisely. You can start check by choosing default value as 640 pixels(use multiple of 16). If you get OOM (Out of memory)error, decrese the value and having more Vrams users can increase this.

For eg. we use 1120 pixel(that is 16x70), get OOM error. Now, we will decrease value and try with 960 pixels(that is 16x60) and so on. 

Load Models


(b) Load Models- Select the downloaded models (Wan2.2 Animate, loras, text encoders, vae,clips etc) one by one. By the way, all the respective models get preselected for you. If you get error message as missing models, follow the installation section explained above.

Add you detailed positive and negative prompts


(c) Prompting- Add you detailed positive and negative prompts.

(d) Input image- Upload your reference image into Load image node. 

Load your image/ video


(e) Input Video- Load your original video as input for reference to replicate the style of video. Use max 5 seconds long for faster generation. The node can auto manage the audio as well if you upload the video with audio embedded. So, you donot need to take care for this. This means you will get the same audio embedded with your generated video.

(f) Upscale image- This node provides you the provision to set the image resolution. This prvents to handle the oversized video. It will upscale you image automatically if you donot set it.

(g) DW Pose Estimator- It uses the DW Pose model feature to estimate and control the body-hand movement, facial expressions in the driving video. You can leave these as default. 

(h) Character Mask & Background Video Preprocessing- Disable the Sampling+ video output group (right click and select-set group to never option) so that we can do preprocessing before videogen that will gonna save our time. After preprocessing confirmation, you can renable it for video generation. 

Enable the Mask Preview and Preview image node by selecting these node and again select unbypass option. This will give you the clear picture frame by frame whats happening on into the video generation and how the model is detecting the character movement. 

Character Mask & Background Video Preprocessing


Now, the points editor node helps you to mask the character and do the video preprocessing. Use the point editor to control your character movement. You will have two colored points (red-negative point to select the background & green-positive point to select the character) starts with 0. Use Shift key + Left click to add green dots and Shift key + right click to add the red dots. Precise selection isn't required. 

Use multiple (5-10 each) red & green dots, as it will help the model to clearly identify the character movement with the background. To set to default state, click the New Canvas option and all the dots will get removed. Alternatively, you can also use the fix node recreate option by right clicking on Points editor node if you to reset the node.

At last, enable  the Sampling+ video output group (right click and select-set group to always option) for video generation . 

(i) Wan Animate To Video- Inside the Sampling+ video output group, you will get the WanAnimateToVideo node. Its the main part that takes prompts, vae, clip, input image with videos etc. Set the background video, character mask option. USe the length value to set video frames length (default-77). If your inputted video have less frames than 77, the rest will be the still image.

Enable disaable nodes for mix/pose mode


There are two modes-
-Mix mode (Character replace)- replace the character from inputted video with the inputted image maintaining the same background.
-Pose mode (Pose transfer)- animates the character with background included in the input image.

Use these nodes background video, character mask to disconnecting / connect to use the two modes. If these are connected then you are on Mix mode, and disconnected means you are using the Pose mode.

(j) KSampler settings- Configure your KSampler settings:
Steps-6 (default), use 1-2 (for faster generation/testing with low quality), use 7-10 (for better video generation with slower inference)
CFG- 1.0
Sampler- Euler
Scheduler- Simple
Denoise-1.0

 

Video extend example + Video Output


(k) Video extend example + Video Output-  This group section helps you to Extending the video generation more than 4 seconds. Enable/disable using right click on the group and select set group to never/always option.

Make sure you use inputted video length more than 8 seconds to get the clear extension otherwise you will get the rest frames will gonna be static.

Tip- Now, if you wish to do longer infinite generation, you just need to do some changes. Add new WanAnimateToVideo node(copy it), then connect the output of video frame offset from existing WanAnimateToVideo node (Sampling+ video output group) to input of video frame offset from new WanAnimateToVideo node. Then, Vae Decode to Continue motion of WanAnimateToVideo node.