Direct Align with Semantic Relative Preference Optimization (SRPO) aims to overcome these challenges. Faster training with fewer steps combined with flexible online reward adjustments will unlock sharper realism and better aesthetics. Users can expect higher quality results without the heavy compute or endless fine-tuning.
Installation
1. Update your ComfyUI from Manager by selecting Update All.
2. Do the basic flux Dev model setup. If you already setup just use that one as it uses the similar VAE, clip, text encoders that used by Flux Dev.
3. Download different models variants as per system requirements and use cases:
(a) Official raw Flux Dev SRPO for High VRAMs with maximum quality (diffusion_pytorch_model.safetensors)
(b) Download any of the quantized models (by wikeeyang):
-4Bit (Flux1-Dev-SRPO-v1-Q4_1.gguf)
-8bit(Flux1-Dev-SRPO-v1-Q8_0.gguf)
-FP8 (Flux1-Dev-SRPO-v1-fp8.safetensors)
(c) Flux dev SRPO BF16 (flux.1-dev-SRPO-bf16.safetensors) by rockerBOO
(d) Flux dev SRPO GGUF by Befox
The models get saved into ComfyUI/models/diffusion_models folder. For GGUF use location ComfyUI/models/unet folder.
4. Use the same VAE, text encoders and Clip models as for Flux Dev. You donot need to download these again.
5. Refresh comfyUI and refresh to take effect.
Workflow
1. Download the official json basic Flux Srpo workflow from official Hugging face repository. For GGUF, download the GGUF Flux Srpo workflow.
2. Drag and drop into ComfyUI. If you get missing error nodes message, just install them from the Manager by selecting Install Missing custom nodes.
3. Load Flux SRPO model into load diffusion model node. If using the GGUF variant use the UNET load node instead.
4. Put your Text prompt into prompt box.
5. set KSampler settings.
6. Hit Run to start generation.
Test 1
Prompt:
Ultra-realistic professional fashion photoshoot of a 25-year-old female
model, wearing a sleek red satin gown, posing confidently in a modern
photography studio with soft key lighting. High-end DSLR look, cinematic
depth of field, realistic skin texture, glossy highlights, Vogue
magazine cover style.
Resolution - 1024 by 1024
CFG- 3.5
Steps- 90
Test 2
Prompt:
Ultra-realistic photoshoot of a young male model in casual denim jacket and white t-shirt, standing on a city street at golden hour, warm sunlight casting natural shadows, cinematic realism, sharp focus, magazine editorial style.
Resolution - 1024 by 1024
CFG- 3.5
Steps- 50
Test 3
Prompt:
A dark grainy security camera footage photo of Santa Claus wearing night vision googles, sneaking through a living room, holding a bottle of Jack Daniels
Resolution - 1024 by 1024
CFG- 3.5
Steps- 80
Test 4
Prompt:
A young Instagram reel creator, stylish and energetic, sitting at a modern desk with ring light and smartphone mounted on a tripod. She is going live with her audience, smiling, waving, and interacting warmly as if talking directly to viewers. Background shows cozy studio lighting with LED neon signs, plants, and creative decor. Cinematic lighting, high-definition, smooth camera panning, soft bokeh in the background. A dynamic, lively atmosphere with slight camera movement for realism. The mood is engaging, cheerful, and professional, designed for Instagram live aesthetics.
Resolution - 1024 by 1024
CFG- 3.5
Steps- 90
Test 5
Prompt:
Ring doorbell footage of Minecraft Steve trying to deliver a Minecraft steak, standing awkardly.
Resolution - 1024 by 1024
CFG- 3.5
Steps- 50
This approach feels like a big leap forward for the practical use of diffusion models. The blend of optimized denoising with online preference alignment not only saves compute but makes the model more user centric. Creative professionals will appreciate the ability to nudge results on the fly with prompt based control. It moves beyond static reward models into a space where AI adapts as quickly as human preferences shift. Direct Align and SRPO make photorealism in AI image generation more achievable and more accessible.