Most diffusion models have pushed video restoration forward, but whenever we try to use them for real-world 4K video upscaling, everything suddenly starts to fall apart. The latency is too high, the computation cost skyrockets, and the models simply don't generalize well once you push them into ultra-high resolutions. So the problem is clear, diffusion-based VSR feels powerful on paper but becomes impractical the moment you need real-time performance or want to handle actual streaming workloads. The promise FlashVSR makes is simple and bold. What if diffusion based video super-resolution could finally run in real time?
![]() |
| flash VSR working explaination (Ref-Official Page) |
What if scaling to 4K or ultra-wide resolutions didn't choke the hardware? And what if you could get state of the art quality without waiting minutes per frame? FlashVSR claims exactly that near real-time 4K upscaling at ~17 FPS on a single A100 GPU.
![]() |
| flash vsr pipeline |
The team has done quite a bit of research. They realized traditional diffusion pipelines don't fit the constraints of streaming VSR, so they introduced a multi-stage distillation process tailored specifically for one-step diffusion inference.
They studied the resolution gap problem and found that locality-constrained sparse attention could cut redundant computation without breaking quality. They also noted that reconstruction layers were bloated in most existing models, so they designed a much smaller conditional decoder to accelerate output.
You can find more in-depth results into their research paper. Beyond that, they even built a massive dataset VSR-120K with 120k videos and 180k images to ensure scalability and strong generalization.
Installation
1. Install ComfyUI if not yet done. If already using, just update it from Manager by clicking on Update all.
2. Setup Kijai's Wan video Wrapper custom node if not done yet. Update it from manager if already installed.
3. Download the FlashVSR models from Kijai's Hugging Face Repository:
(a) Main Diffusion Model- Wan2_1-T2V-1_3B_FlashVSR_fp32.safetensors
(b) Low-Quality Projection- Wan2_1_FlashVSR_LQ_proj_model_bf16.safetensors
Save them inside ComfyUI/models/diffusion_models folder.
Then, Download Tiny Model Decoder VAE (Wan2_1_FlashVSR_TCDecoder_fp32.safetensors) and save it inside ComfyUI/models/vae folder.
5. Restart and Refresh ComfyUI.
Workflow
1. You will get the workflow (wanvideo_1_3B_FlashVSR_upscale_example.json) inside ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/example_workflows folder.
![]() |
| FlashVSR Workflow |
2. Drag and drop into ComfyUI.
(a) Load your short video clip into Load video node.
(b) Load the Main Video Model (wan2.1T2V1.3b_FlashVSR) into WanVideoModelLoader node. This loads the main WanVideo model (T2V 1.3B FlashVSR).
It also loads an extra model (wan2.1_FlashVSR_LQ_proj_bf16) into WanVideo Extra Model select node that helps improve low-quality frames. For Faster inference use Attention mode-SDPA, Flash attention, Sage attention etc. Currently only up to 2K upscaling supported. Beyond this will cause error.
Official Locality-Constrained Sparse Attention (LCSA), streaming features haven't been implemented into this. Into the official repository its claimed that you can upscale up to 4K. May be merged later with Kijai's code implementation.
(d) Load the Text Encoder (wan 2.1 umt5_xxl_enc_bf16) into LoadWanVideoT5TextEncoder. This is the block that understands your prompt and negative prompt. Add prompts that convert your prompt to Embeddings in WanVideoTextEncode node.
(e) Prepare Empty Embeds for FlashVSR into ImageResizeKJv2 node. All input frames are resized to 1024 x 1024. This ensures the FlashVSR model gets frames in the correct size.
(f) Add wan2.1 FlashVSR TCDecoder model(fp16) into WanVideoFlashVSR decoder loader node.
(g) Set the Core Video Sampler parameters in WanVideoSampler node. This is the main controlling unit.
Step-1
CFG-1
Shift-5
Scheduler-Euler
Here, the model reads your prompt reads frame embeds, reads the FlashVSR inputs then, processes frames step-by-step. This generates improved/upscaled latent frames.
(h) Hit Run to execute the workflow.
Tips for Improving FlashVSR upscaling output
-FlashVSR works best when your input video: Is not too noisy, Is not overly compressed, Has stable motion. If possible, denoise or sharpen lightly before upscaling.
-Good resizing helps FlashVSR produce better detail. Use: Method: Lanczos, Resize to at least 720p or 1024px longest side. This matches your workflow and helps quality.
-FlashVSR responds well when prompts. Describe-Sharp details,Realistic textures,Camera styles
-It can't repair heavy motion blur. If a shot is too blurry, cut that part out before processing.
-1024 by 1024 or 720p is ideal. Going too high causes-Artificial textures, over sharpening, extra hallucinated details and stay within FlashVSR's designed limits.
-It enhances detail, it does not need heavy styling. Use simpler prompts for natural results.






