stable-diffusion.cpp [Feature] Z-Image-Turbo

Feature Summary

Z-Image is a powerful and highly efficient image generation model with 6B parameters.

Detailed Description

"According to the Elo-based Human Preference Evaluation, Z-Image-Turbo shows highly competitive performance against other leading models, while achieving state-of-the-art results among open-source models."

https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

Alternatives you considered

No response

Additional context

No response

Nov 26 '25 23:11 JohnLoveJoy

Hi there!

Yes, this model looks really great, would be nice to have it integrated in stable-diffusion.cpp

Nov 27 '25 09:11 ali0une

It uses Flux.1 VAE and Qwen3 4B as text encoder.

Side note: It's wild how many competiting opensource AI labs Alibaba is funding (Qwen, Wan, Ling... and now this one too)

Nov 27 '25 15:11 stduhpf

Definitely this.

Nov 27 '25 21:11 whoreson

Even with 1.2GB VRAM https://www.reddit.com/r/StableDiffusion/comments/1p89e2e/zimage_turbo_12gb_vram_tests/

Nov 27 '25 23:11 vhanla

Indeed, impressive results, considering that this GPU is much weaker than a current iGPU.

Nov 27 '25 23:11 JohnLoveJoy

Soon. Maybe this weekend.

Nov 28 '25 18:11 leejet

Soon. Maybe this weekend.

@leejet I took a shot at it: https://github.com/leejet/stable-diffusion.cpp/pull/1018 Feel free to iterate on it or reuse any parts that are useful to you

Nov 29 '25 07:11 rmatif

@rmatif Oops, I didn’t expect that you were also working on support for Z image, so I implemented it myself https://github.com/leejet/stable-diffusion.cpp/pull/1020. Anyway, thanks for your contribution!

Nov 29 '25 19:11 leejet

Even with 1.2GB VRAM https://www.reddit.com/r/StableDiffusion/comments/1p89e2e/zimage_turbo_12gb_vram_tests/

Is this even possible with sd cpp? We have --offload-to-cpu but all that does is it loads the llm to the GPU, runs it, and then loads the image model to the GPU to generate the final image. Even if we quantize this model to Q3 it's still like 3GB in size.

Nov 30 '25 16:11 netrunnereve

@netrunnereve No I don't think it's possible. Per layer offloading would be neat, but right now it's not implemented.

Nov 30 '25 16:11 stduhpf

@leejet My MacBook with an M3 chip takes 600 seconds per iteration. With ComfyUI, it only takes 19 seconds per iteration. Is this normal? Or am I doing something wrong?

Dec 01 '25 21:12 brenzel

This might be because you are not using Metal acceleration. You can refer to the build documentation to enable Metal support.

Dec 02 '25 17:12 leejet