You have probably run into a frustrating limitation on speed drops dramatically when you use multiple reference images. FLUX.2 [klein] 9B-KV a successor of Flux.2 Klein 9B model released by Black Forest Labs makes work simple.
Most diffusion based editing pipelines reprocess every reference image during every denoising step. That means if your model runs 4 denoising steps and you use several reference images, the model repeatedly computes the same visual tokens again and again. The model is registered under FLUX Non-Commercial License.
![]() |
| Flux2 klein 9b kv showcase |
This becomes especially noticeable in interactive tools, where users expect near-instant feedback when adjusting prompts or generating variations. So even though modern models are powerful, redundant computation becomes the bottleneck.
This optimized model variant introduces KV-cache support, allowing the model to compute reference image representations once and reuse them during the rest of the generation process. In practice, that means its faster multi-reference editing, reduced redundant computation, up to 2.5x faster inference.
Installation
1. Make sure you already installed ComfyUI. Update it from the Manager if already installed.
2. Download Flux2 Klein 9B KV model variants. Choose the one that suits your VRAM and system RAM:-
(a) Flux2 Klein 9B KV BF16 (flux-2-klein-9b-kv.safetensors) by Black Forest Labs.
(b) Flux2 Klein 9B KV FP8 (flux-2-klein-9b-kv-fp8.safetensors) by Black Forest Labs.
Save this inside ComfyUI/models/diffusion_models folder.
(c) Flux2 Klein 9B KV GGUF (Q2 for fast inference with quality degradation to Q8 for high quality with slow inference) by Quanstack. Choose the one that support your system resources.
Save this inside ComfyUI/models/unet folder. Make sure you have already installed the ComfyUI-GGUF custom node by city-96 from the Manager. If already done, update this custom node from the Manager itself.
3. As the model is built on basic Flux2 Klein 9B, rest of the models (text encoders,and vae ) will be same. If not installed yet follow our Flux2 Klein 9B tutorial.
(a) Download any of the Text encoder and save this inside ComfyUI/models/text_encoders folder.
(b) Download VAE and put this inside ComfyUI/models/vae folder.
4. Restart and refresh ComfyUI.
Workflow
1. Download Flux2 klein 9B KV Workflows from our hugging Face Repository.
(a) Flux2_klein_kv_Txt2Img.json (Text to Image workflow)
(b) Flux2_klein_image_edit_9b_kv.json (Multi Image edit workflow)
2. Drag and simply drop into ComfyUI application.
Text To Image
(a)Load flux2 klein 9b kv into load diffusion model node. If using gguf use the unet loader instead.
(b) Load text encoders, vae into their respective nodes.
(c) Put the text prompt to generate the image.
(d) Set the KSampler settings
(e) Hit the run button to generate.
Prompt:
Grungy analog photo of 10 year old Finn and Jake (golden retriever) from adventure time in 2012 watching adventure time on CRT TV in a dimly lit bedroom. Finn sitting on the floor in front of the TV holding the petting Jake with one hand and both looking back at the camera taking the photo while the game is on in the background visible to us. Flash photography, unedited.
Steps:4
Resolution: 1024 by1024
Prompt:
A 23-year-old Instagram model real world version of Asuka LAngley from Evangelion with characteristic hairstyle, blue eyes, and a confident smile is posing in front of the gym mirror. She’s wearing form-fitting light red leggings and a matching crop top, highlighting her toned physique. Her left hand is holding the phone up to capture the shot, while her right arm rests casually by her side. The gym is clean and spacious with dumbbells, machines, and a water bottle on the floor beside her. Behind her one can see her discarded eva unit 2 plugsuit.
Steps:4
Resolution: 1024 by1024
Prompt:
Grand Theft Auto San Andreas, CJ Selfie, CJ standing on Grove Street, taking a selfie at sunset, PS2 graphics, in game graphics, homies talking in background
Multi Image Editing
(a) Load images for multi-editing into load image node.
(b) Load flux2 klein 9b kv into load diffusion model node. If using gguf use the unet loader instead.
(c) Select text encoders, vae into their respective nodes.
(d) Put the text prompt to generate the image.
(e) Set the KSampler settings
(f) Hit the run button to generate.
We added two images, figure1 (a girl as the subject) and figure 2(a set of outfit) and put the relevant prompt into prompt box.
![]() |
| figure 1 |
![]() |
| figure 2 |
Prompt- Make the girl in Figure 1 put on the outfits from Figure 2, wear a white Tshirt with suit, pant and carrying a purse. Remove the blue cap. Then, change the background environment where she is sitting outside the restaurant.
We tried with multiple generation. The model tried to replicate the cap as well, so we have mentioned to "remove the blue cap".
There are some limitation with the face consistency, and objects handling. You can see everything except the purse has been handled well. To get the perfect output, you need to describe in detail what you want and what you donot.
Conclusion
FLUX.2 klein 9B-KV highlights an important trend in generative AI where optimization is becoming just as important as model size. Instead of simply scaling parameters, researchers are focusing on smarter computation strategies. KV caching may sound like a small architectural tweak, but its impact is huge especially for interactive creative tools where latency matters.
If this direction continues, we will likely see more models designed not just for quality, but also for real-time usability. And that’s a big step toward making AI image generation feel instant, responsive, and truly interactive.

![FLUX.2 [klein] 9B-KV FLUX.2 [klein] 9B-KV](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgILk1nl_AH6caQkaI-mZVyJQwimor56NiYaWRSNsSZrMYFL5cWf8rX5BZn-2mmRElNIyafS5BV-Lxit_Qg7shBbqEjQOSTsO9H0xt8_oN0vk5N4K7LxBDBZz39WTFQhxGQ6qoNKxcbRARnMHtcJPxl-DTcDaynuz6e4j92Qt9yHtKR2EUIlUmCF82RT2DZ/w640-h384-rw/flux-klein-9b-kv-comfyui.webp)





.webp)
.webp)


