stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

The inference speed of stable-diffusion.cpp is slower than that pytorch?

Open zeng121 opened this issue 6 months ago • 9 comments

I used the same model and sampling steps in both stable-diffusion.cpp and ComfyUI. It seems that sd.cpp is slower in terms of speed. Image

Image

zeng121 avatar Aug 06 '25 06:08 zeng121

Which backend are you using? If you're using something like Vulkan and comparing it to Cuda pytorch, then its not too surprising.

stduhpf avatar Aug 06 '25 06:08 stduhpf

Which backend are you using? If you're using something like Vulkan and comparing it to Cuda pytorch, then its not too surprising

I'm using Cuda. Why is it so much slower than PyTorch? Is it because my way of operation is incorrect?

zeng121 avatar Aug 06 '25 06:08 zeng121

I don't know, I have an AMD GPU, so I never tried Cuda. All I know is that on ROCm, stable-diffusion.cpp is quite faster than pytorch with zluda.

stduhpf avatar Aug 06 '25 06:08 stduhpf

I don't know, I have an AMD GPU, so I never tried Cuda. All I know is that on ROCm, stable-diffusion.cpp is quite faster than pytorch with zluda.

Thank you for your response. Perhaps PyTorch's optimization based on CUDA is better than that of GGML.

zeng121 avatar Aug 06 '25 07:08 zeng121

Perhaps. But twice as slow is a bit much. Are you using --cfg-scale 1 for Flux with sd.cpp?

stduhpf avatar Aug 06 '25 09:08 stduhpf

Perhaps. But twice as slow is a bit much. Are you using --cfg-scale 1 for Flux with sd.cpp?

Yep

zeng121 avatar Aug 06 '25 09:08 zeng121

Are you using --diffusion-fa ?

Green-Sky avatar Aug 06 '25 10:08 Green-Sky

Are you using --diffusion-fa ?

no!

zeng121 avatar Aug 07 '25 01:08 zeng121

similar problem in wan vace. inference speed in sd.cpp is slower than the original pytorch.

Len-Li avatar Oct 21 '25 12:10 Len-Li