stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

Support for new SD3.5 models ( large turboX and medium turbo)

Open mpulukkinen opened this issue 10 months ago • 7 comments

I tried yesterday running those new models with stale-diffusion.cpp ()although it was not with the latest commits) and ctx-context failed to load. So, these models require some implementation to be done? https://www.reddit.com/r/StableDiffusion/comments/1j406g1/sd35_large_turbox_just_released/

Also, offtopic, there's two major advancements in local video generation, wan2.1 & https://github.com/Tencent/HunyuanVideo-I2V, those would be really cool addition

mpulukkinen avatar Mar 07 '25 06:03 mpulukkinen

Sd3.5 Medium turbo is working just fine for me. I believe large should work too.

stduhpf avatar Mar 09 '25 09:03 stduhpf

Sd3.5 Medium turbo is working just fine for me. I believe large should work too.

@stduhpf can you share example command to run sd3.5 medium turbo?

DanielMazurkiewicz avatar Mar 13 '25 21:03 DanielMazurkiewicz

can you share example command to run sd3.5 medium turbo?

Using the LoRA version:

.\build\bin\Release\sd.exe -m ..\ComfyUI\models\checkpoints\sd3.x\sd3.5-m\sd3.5_medium-q8_0.gguf --clip_l ..\ComfyUI\models\clip\clip_l\clip_l.q8_0.gguf --clip_g ..\ComfyUI\models\clip\clip_g\clip_g.q8_0.gguf --t5xxl ..\ComfyUI\models\clip\t5\t5xxl_q4_k.gguf -H 1024 -W 768 -p 'A beautiful bald girl with silver and white futuristic metal face jewelry, her full body made of intricately carved liquid glass in the style of Tadashi, the complexity master of cyberpunk, in the style of James Jean and Peter Mohrbacher. This concept design is trending on Artstation, with sharp focus, studio-quality photography, and highly detailed, intricate details.<lora:lora_sd3.5m_turbo_8steps:1>' --cfg-scale 1.5 --sampling-method euler --vae-tiling --preview proj --steps 8 --lora-model-dir ..\ComfyUI\models\loras\sd3.5m\

Image

Using the .safetensors:

.\build\bin\Release\sd.exe -m ..\ComfyUI\models\checkpoints\sd3.x\sd3.5-m\sd3.5m_turbo.safetensors --clip_l ..\ComfyUI\models\clip\clip_l\ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF-q8_0.gguf --clip_g ..\ComfyUI\models\clip\clip_g\clip_g.q8_0.gguf --t5xxl ..\ComfyUI\models\clip\t5\t5xxl_q4_k.gguf -H 1024 -W 768 -p 'A beautiful bald girl with silver and white futuristic metal face jewelry, her full body made of intricately carved liquid glass in the style of Tadashi, the complexity master of cyberpunk, in the style of James Jean and Peter Mohrbacher. This concept design is trending on Artstation, with sharp focus, studio-quality photography, and highly detailed, intricate details.' --cfg-scale 1.5 --sampling-method euler --vae-tiling --preview proj --steps 8

Image

(Don't use the official GGUFs with sdcpp, there are two incompatible standards for SD3.5 GGUFs, the official ones are made for ComfyUI. If you want to use quantized models, you have to quantize them from the .safetensors with sdcpp)

stduhpf avatar Mar 13 '25 22:03 stduhpf

Just tested large TurboX lora, it also kinda works.

.\build\bin\Release\sd.exe -m ..\ComfyUI\models\checkpoints\sd3.x\sd35-large-mixed\sd3.5_large-iq4_nl.gguf --clip_l ..\ComfyUI\models\clip\clip_l\ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF-q8_0.gguf --clip_g ..\ComfyUI\models\clip\clip_g\clip_g.q8_0.gguf --t5xxl ..\ComfyUI\models\clip\t5\t5xxl_q4_k.gguf -H 1280 -W 896 -p "Cinematic close-up on Fan Bingbing's regal features as she embodies ancient Egyptian queen Cleopatra. Golden light bathes her porcelain skin, highlighting the delicate contours of her face and the piercing gaze that commands attention. Rich fabrics drape elegantly across her statuesque form, with a subtle emphasis on the ornate jewelry and intricately designed headdress that crowns her majestic presence.<lora:Tensorart_TurboX_sd3.5L_8steps:4>" --cfg-scale 1 --sampling-method euler --vae-tiling --preview proj --steps 8 --lora-model-dir ..\ComfyUI\models\loras\sd3.5l\

Image

I had to set the lora weight to 4, which is insane, but that's probably because I'm using a quantized base model. It could also be because we can't change the shift value... (they say it's important in the model card)

stduhpf avatar Mar 13 '25 22:03 stduhpf

Thanks for posting your commands @stduhpf . Apparently I'm doing something wrong or I'm missing something

./build-rocm/bin/sd \
--prompt 'A beautiful bald girl with silver and white futuristic metal face jewelry, her full body made of intricately carved liquid glass in the style of Tadashi, the complexity master of cyberpunk, in the style of James Jean and Peter Mohrbacher. This concept design is trending on Artstation, with sharp focus, studio-quality photography, and highly detailed, intricate details.' \
--model ../stable-diffusion-models/sd35/sd3.5_medium-Q8_0.gguf \
--clip_l ../stable-diffusion-models/sd35/clip_l-Q8_0.gguf \
--clip_g ../stable-diffusion-models/sd35/clip_g-Q8_0.gguf \
--t5xxl ../stable-diffusion-models/sd35/t5xxl-Q4_0.gguf \
--output output_20250314_181637.png \
--width 768 \
--height 1024 \
--steps 8 \
--cfg-scale 1.5

And this is what I get: :-)

Image

Models I use: https://huggingface.co/gaianet/stable-diffusion-3.5-medium-GGUF/tree/main

DanielMazurkiewicz avatar Mar 14 '25 17:03 DanielMazurkiewicz

You forgot to add the lora :)

If you want it to run at 8 steps, use these models: https://huggingface.co/tensorart/stable-diffusion-3.5-medium-turbo/tree/main (either replace the model with sd3.5m_turbo.safetensors or add lora_sd3.5m_turbo_8steps.safetensors, lora-s-8step-final.safetensors or lora_sd3.5m_4steps.safetensors as a LoRA) @DanielMazurkiewicz

stduhpf avatar Mar 14 '25 17:03 stduhpf

thanks for spotting the issue @stduhpf !

DanielMazurkiewicz avatar Mar 18 '25 11:03 DanielMazurkiewicz