stable-diffusion.cpp The inference speed of stable-diffusion.cpp is slower than that pytorch?

I used the same model and sampling steps in both stable-diffusion.cpp and ComfyUI. It seems that sd.cpp is slower in terms of speed.

Aug 06 '25 06:08 zeng121

Which backend are you using? If you're using something like Vulkan and comparing it to Cuda pytorch, then its not too surprising.

Aug 06 '25 06:08 stduhpf

Which backend are you using? If you're using something like Vulkan and comparing it to Cuda pytorch, then its not too surprising

I'm using Cuda. Why is it so much slower than PyTorch? Is it because my way of operation is incorrect?

Aug 06 '25 06:08 zeng121

I don't know, I have an AMD GPU, so I never tried Cuda. All I know is that on ROCm, stable-diffusion.cpp is quite faster than pytorch with zluda.

Aug 06 '25 06:08 stduhpf

I don't know, I have an AMD GPU, so I never tried Cuda. All I know is that on ROCm, stable-diffusion.cpp is quite faster than pytorch with zluda.

Thank you for your response. Perhaps PyTorch's optimization based on CUDA is better than that of GGML.

Aug 06 '25 07:08 zeng121

Perhaps. But twice as slow is a bit much. Are you using --cfg-scale 1 for Flux with sd.cpp?

Aug 06 '25 09:08 stduhpf

Perhaps. But twice as slow is a bit much. Are you using --cfg-scale 1 for Flux with sd.cpp?

Yep

Aug 06 '25 09:08 zeng121

Are you using --diffusion-fa ?

Aug 06 '25 10:08 Green-Sky

Are you using --diffusion-fa ?

no!

Aug 07 '25 01:08 zeng121

similar problem in wan vace. inference speed in sd.cpp is slower than the original pytorch.

Oct 21 '25 12:10 Len-Li