Wan 2.2 Bernini: Ref-Video Editing and Style Transfer

Video generation has evolved quickly, but most existing systems still struggle with one major limitation is that they are usually built for a single task. One model might generate videos from text. Another might edit existing videos. A different system might work with reference images. The result is a fragmented workflow where each task requires a separate architecture, separate training, and separate optimization.

Bernini (released by ByteDance & Bernini Team) supports multiple video workflows inside a unified framework including text-to-video (T2V), subject-to-video (R2V), video editing (V2V), and reference-guided video editing (RV2V) that handles wisely and smoothly. The model has been merged into Wan2.2 by the community so that you can do you required task in a systematic manner.

Rather than forcing a single model to do everything, it divides responsibilities intelligently. It allows one component to think and another component to create. The framework assigns semantic planning to an MLLM-based planner. Instead of generating pixels directly, this planner predicts the target semantic representation inside the Vision Transformer (ViT) embedding space.

Bernini working architecture

Once the semantic blueprint is prepared, a Diffusion Transformer (DiT)-based renderer takes over and converts those instructions into realistic video outputs. For editing tasks, Bernini introduces additional source Variational Autoencoder (VAE) features to preserve important visual details while making modifications. More detailed insights can be found into their research paper.

The framework also introduces two notable improvements-
(a) Segment Aware 3D Rotary Positional Embedding (SA 3D RoPE) to better process multiple visual inputs and maintain spatial-temporal understanding.
(b) Chain of thought reasoning inside the planner, helping the model transfer deeper understanding into the generation process.

Installation

1. First of all, install and setup ComfyUI to run this model. Older user need to update ComfyUI to its latest version from the Manager tab.

2. Now, you need to have the basic Wan 2.2 I2V installation already setup, as the Wan 2.2 Bernini workflow is dependent on this workflow you will need the text encoders, vae models ready to work.

3. Download the Wan 2.2 Bernini pair of (High and Low) models. There are different model (fp16/fp8 scaled/fp8 mixed/gguf)variants. Choose any of them as per your system resources:

(a) Wan 2.2 Bernini (High & Low) FP16 repacked By Kijai - for high VRAM users minimum 24GB

(b) Wan 2.2 Bernini (High & Low) FP8 scaled / Fp8 mixed Optimized By Kijai - for low VRAM users minimum 16GB

(c) Wan 2.2 Bernini(High & Low) FP16-FP8-mixed repacked by Comfy team - for High/low VRAM users atleast 16-24 GB. These are same models listed above by kijai at one place. Its also have Wan 2.1 support. For this, you need the basic wan2.1 workflow already setup.

Save these(high & low)models into ComfyUI/models/diffusion_models folder.

(d) Wan 2.2 Bernini GGUF (High & Low), for minimum 12GB Vram. Choose any of the (Q4/Q5/Q8) high-low pair of models.

If using this variant, save this into ComfyUI/models/unet folder.

4. Restart and refresh comfyui

Workflow

1. Download the workflow (Wan2.2_Bernini.json) from our Hugging face repository.
If using GGUF variant, replace the diffusion model loader node with unet loader.

2. Drag and drop the workflow into ComfyUI canvas.

3. Load all the models(wan 2.2 bernini high & low, text encoders, vae etc) into their relevant nodes.

4. Upload your reference images/ videos(supports 2-5 max) to do the style transfer/ video editing/ subject removal etc.

5. Put your relevant prompts into prompt box.

6. Set KSampler configuration:
sampler - res_multistep,
resolution- 720p (for high vrams), 480p(for low vrams)

7. Hit run to start the generation.

Wan 2.2 Bernini: Ref-Video Editing and Style Transfer

Installation

Workflow

Posted by Administrator

Search This Blog

Popular Posts

Top 26 Krea2 LoRA models for Stylized Image Generation

Krea2 Raw/Base & Turbo (BF16/FP8/NVFP4/INT8) High Quality Image Gen

Sulphur 2 -The Uncensored LTX2.3 Video Generation

Wan 2.2 Dancer: Consistent Dance Video from Music & Ref Image

Install Forge Neo WebUI- Better than Forge & Automatic1111

Best 19 Ltx 2.3 LoRA Models for Optimized Video Generation

Important Pages

Our Social Page

Recent Post

Contact form

Wan 2.2 Bernini: Ref-Video Editing and Style Transfer

Installation

Workflow

Posted by Administrator

Related Posts

Search This Blog

Our Social Community

Popular Posts

Top 26 Krea2 LoRA models for Stylized Image Generation

Krea2 Raw/Base & Turbo (BF16/FP8/NVFP4/INT8) High Quality Image Gen

Sulphur 2 -The Uncensored LTX2.3 Video Generation

Wan 2.2 Dancer: Consistent Dance Video from Music & Ref Image

Install Forge Neo WebUI- Better than Forge & Automatic1111

Best 19 Ltx 2.3 LoRA Models for Optimized Video Generation

Important Pages

Our Social Page

Recent Post

Contact form