The inference speed of stable-diffusion.cpp is slower than that pytorch?
I used the same model and sampling steps in both stable-diffusion.cpp and ComfyUI. It seems that sd.cpp is slower in terms of speed.
Which backend are you using? If you're using something like Vulkan and comparing it to Cuda pytorch, then its not too surprising.
Which backend are you using? If you're using something like Vulkan and comparing it to Cuda pytorch, then its not too surprising
I'm using Cuda. Why is it so much slower than PyTorch? Is it because my way of operation is incorrect?
I don't know, I have an AMD GPU, so I never tried Cuda. All I know is that on ROCm, stable-diffusion.cpp is quite faster than pytorch with zluda.
I don't know, I have an AMD GPU, so I never tried Cuda. All I know is that on ROCm, stable-diffusion.cpp is quite faster than pytorch with zluda.
Thank you for your response. Perhaps PyTorch's optimization based on CUDA is better than that of GGML.
Perhaps. But twice as slow is a bit much. Are you using --cfg-scale 1 for Flux with sd.cpp?
Perhaps. But twice as slow is a bit much. Are you using
--cfg-scale 1for Flux with sd.cpp?
Yep
Are you using --diffusion-fa ?
Are you using
--diffusion-fa?
no!
similar problem in wan vace. inference speed in sd.cpp is slower than the original pytorch.