SkyReels(V1/V2) for Infinite Videos length and Human Centric style

install skyreels opensource model locally using comfyui

Another diffusion-based video generation model has entered the open-source market: Skyreels(V1), a human-centric video model and Skyreels(V2)-infinte Video filming model fine-tuned on HunyuanVideo and WanVideo respectively. It offers open-source leadership, advanced facial animation, and cinematic lighting and aesthetics.

However, the model requires a significant amount of VRAM at least 79GB making it difficult to run on standard hardware. The best solution is to use a quantized version of the model. In this guide, we will show you a better alternative to run Skyreels without encountering out-of-memory errors.


Installing process

update ComfyUI from the manager

Get install ComfyUI if you have not yet. Older user have to update it from the manager section by selecting "Update ComfyUI" option.


SkyReelsV2(For Infinite Video Length generation)

1. The project has been merged to WanVideo Wrapper. So you have to install the Kijai's WanVideo wrapper if you haven't done yet. People having this custom nodes only have to update the custom node. You can to this either from the Manager or use "git pull" command into the command prompt.


download skyreelsV2 video generation model

2. Download the SkyreelV2 model from Kijai's Hugging Face repository. Save it inside "ComfyUI/models/diffusion_models" folder. Also download the same VAE and text encoders mentioned for Kijai's WanVideo. If you already done then its not required.

Here, there are multiple different variant(TxtToVideo/ImgToVideo) to choose from with resolutions (540/720p) and model types (fp8/fp16). If you have the low VRAM, use the fp8 variant or 1.3B parameter model.

GGUF Variant

Their is also a better option for people using low VRAM.

(a) Install and setup GGUF Custom node by City 96. Update it if you already have.

(b) Select and Download any of the models from Hugging face repository:

-  SkyReels-V2-I2V-14B-720P for Image to Video 720p generation

- SkyReels-V2-I2V-14B-540p  for Image to Video 540p generation

SkyReels-V2-T2V-14B-720P for for Text to Video 720p generation

- SkyReels-V2-T2V-14B-540P for for Text to Video 540p generation

Then, save it into "ComfyUI/models/unet" folder.


SkyReelV1(For Human Centric Generation)

1. Clone the repository provided by Kijai for HunyuanVideo if you have not yet. The user having this custom node already installed need to just update it from the ComfyUI manager section.

download skyreels quantized model

2. Download any of the quantized model Q3(lower weight with lower quality generation) to Q8(heavy weights with higher generated quality) form Hugging Face repository.

Choose the one which suits your system requirements. Save it inside "ComfyUI/models/diffusion_models" folder. Also download the same VAE and text encoders mentioned for Kijai's HunyuanVideo. If you already done then its not required.

3.Restart ComfyUI to take effect.


Workflow

1. As the Skyreels fined tuned on HunyuanVideo, the workflow will be same as that of the HunyuanVideo. So, simply use the Kijai's workflow for HunyuanVideo that we have explained in our tutorial.

For SkyreelsV2 use the following folder path. ComfyUI-WanVideoWrapper/example_workflows

If you are using the GGUF model, make sure to replace the model node with unet loader node.

2. Now, to input the image use the "InstructPixtoPixConditioning" node or something similar that adds an encoded image.

3. Upload the downloaded quantized model from the Load Diffusion model node. Rest of all will be as same as HunyuanVideo. 

4. You can put your positive prompt to add the detailing for your generation. 

5. Add your prompt and click "Run" to start your generation.

We have inputted a girl's image who is wearing a white colored tank top. Now, our goal is to create a human video that showcase the clothing fabric quality for ecommerce perspective.

Prompt used: A high-resolution studio photograph of a female model wearing a sleeveless tank top, standing against a clean white background. The fabric appears soft, lightweight, and breathable, with a modern, form-fitting design. The model has natural-looking makeup and neat hair, giving off a casual yet stylish vibe. The lighting is bright and even, with soft shadows for depth. The model poses confidently with relaxed shoulders, smiling subtly, showcasing the top from multiple angles—front, back, and side. Close-up shots capture fabric texture, stitching, and branding details, ensuring a premium look. The overall composition is minimalistic yet engaging, ideal for eCommerce product listings.

CFG:6

Embedded Guided scale: 1

Steps: 20

skyreels video generation result

Here, is the generated result. Well, not so bad but you will observe their is some kind of blurriness on her eyes with some lower frame quality. The video has been generated in 720p format. The officially recommended resolution is 960(height) by 544 (width) with CFG value as 6 and embedded guided scale to 1. Longer video length will take subsequently much time as compared to shorter video generation.

You need to turn one the block swap if you are using 12GB or lower VRAM. For higher its not required.

The model is in the initial stage so you cannot expect much from this but great initiative for fune-tuning. Sometimes model generates a kind of artifacts and lose frames.