stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

[Feature] Z-Image-Turbo

Open JohnLoveJoy opened this issue 2 months ago • 12 comments

Feature Summary

Z-Image is a powerful and highly efficient image generation model with 6B parameters.

Detailed Description

"According to the Elo-based Human Preference Evaluation, Z-Image-Turbo shows highly competitive performance against other leading models, while achieving state-of-the-art results among open-source models."

Image Image Image Image

https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

Alternatives you considered

No response

Additional context

No response

JohnLoveJoy avatar Nov 26 '25 23:11 JohnLoveJoy

Hi there!

Yes, this model looks really great, would be nice to have it integrated in stable-diffusion.cpp

ali0une avatar Nov 27 '25 09:11 ali0une

It uses Flux.1 VAE and Qwen3 4B as text encoder.

Side note: It's wild how many competiting opensource AI labs Alibaba is funding (Qwen, Wan, Ling... and now this one too)

stduhpf avatar Nov 27 '25 15:11 stduhpf

Definitely this.

whoreson avatar Nov 27 '25 21:11 whoreson

Even with 1.2GB VRAM https://www.reddit.com/r/StableDiffusion/comments/1p89e2e/zimage_turbo_12gb_vram_tests/ Image

vhanla avatar Nov 27 '25 23:11 vhanla

Indeed, impressive results, considering that this GPU is much weaker than a current iGPU.

JohnLoveJoy avatar Nov 27 '25 23:11 JohnLoveJoy

Soon. Maybe this weekend.

leejet avatar Nov 28 '25 18:11 leejet

Soon. Maybe this weekend.

@leejet I took a shot at it: https://github.com/leejet/stable-diffusion.cpp/pull/1018 Feel free to iterate on it or reuse any parts that are useful to you

rmatif avatar Nov 29 '25 07:11 rmatif

@rmatif Oops, I didn’t expect that you were also working on support for Z image, so I implemented it myself https://github.com/leejet/stable-diffusion.cpp/pull/1020. Anyway, thanks for your contribution!

leejet avatar Nov 29 '25 19:11 leejet

Even with 1.2GB VRAM https://www.reddit.com/r/StableDiffusion/comments/1p89e2e/zimage_turbo_12gb_vram_tests/

Is this even possible with sd cpp? We have --offload-to-cpu but all that does is it loads the llm to the GPU, runs it, and then loads the image model to the GPU to generate the final image. Even if we quantize this model to Q3 it's still like 3GB in size.

netrunnereve avatar Nov 30 '25 16:11 netrunnereve

@netrunnereve No I don't think it's possible. Per layer offloading would be neat, but right now it's not implemented.

stduhpf avatar Nov 30 '25 16:11 stduhpf

@leejet My MacBook with an M3 chip takes 600 seconds per iteration. With ComfyUI, it only takes 19 seconds per iteration. Is this normal? Or am I doing something wrong?

brenzel avatar Dec 01 '25 21:12 brenzel

This might be because you are not using Metal acceleration. You can refer to the build documentation to enable Metal support.

leejet avatar Dec 02 '25 17:12 leejet