stable-diffusion.cpp Vulkan on AMD Ryzen AI APU/iGPU generates worse images than CPU, or just colorful noise

When I run stable-diffusion.cpp with Vulkan on a Ryzen AI 9 HX 370 (Radeon 890M iGPU), the resulting images are very different from what I get when running on CPU with the AVX2 build. Some comparison pics follow.

SDXL

For reference, the below pic is what I get from SDXL on my CPU if I prompt as follows: sd -m sd_xl_base_1.0.safetensors --vae sdxl.vae.safetensors -H 1024 -W 1024 -p "a lovely cat" (Note that I needed to use madebyollin's fp16 vae to get an output that isn't all black.) sd-cpp-avx2_vae-fp16_cat_1024x1024_output

And below is what I get from SDXL on my GPU using Vulkan: sd -m sd_xl_base_1.0.safetensors --vae sdxl.vae.safetensors --vae-on-cpu -H 1024 -W 1024 -p "a lovely cat" (Note that running VAE on the CPU versus tiled on the GPU produces essentially the same-looking image below. Attempting to run on GPU without tiling fails when it requests an excessive amount of memory, as described in stduhpf's comment here.) sd-cpp-vulkan_vae-fp16-on-cpu_cat_1024x1024_output

SD 1.5

With SD 1.5, Vulkan at least produces actual cat pictures, but they are blurry or deformed compared to CPU.

For reference, below is what I get from the CPU for the following prompt: sd -m v1-5-pruned-emaonly.safetensors -p "a lovely cat" sd-cpp-avx2_cat_output

And below is what I get from the GPU with Vulkan: sd -m v1-5-pruned-emaonly.safetensors -p "a lovely cat" (I also tried running this with the VAE on the CPU, but it gives the same cat below with no apparent visual difference.) sd-cpp-vulkan_cat_output

Finally, running clip on the CPU gives a different, more-deformed cat: sd -m v1-5-pruned-emaonly.safetensors --vae-on-cpu --clip-on-cpu -p "a lovely cat" sd-cpp-vulkan_vae+clip-on-cpu_cat_output

Jan 10 '25 03:01 lostdisc

The same issue, with flux, also noise image:

My CPU and GPU: Ryzen AI 9 HX PRO 375 (Radeon 890M iGPU)

cfg_scale: 1 steps; 4

Feb 26 '25 22:02 zhycheng614

@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR https://github.com/leejet/stable-diffusion.cpp/pull/509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.

Feb 26 '25 22:02 stduhpf

@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR #509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.

Yes, llama.cpp vulkan works on my system, can perform correct inference.
On Apple's M1 chip with metal, the same problem: image with noise.
On Apple's M3 Pro chip with metal, can work very well.
On AMD CPU, works very well, high quality image.

Feb 26 '25 22:02 zhycheng614

@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR #509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.

Yes, llama.cpp vulkan works on my system, can perform correct inference.

On Apple's M1 chip with metal, the same problem: image with noise.

On Apple's M3 Pro chip with metal, can work very well.

On AMD CPU, works very well, high quality image.

If PR #509 doesn't fix it, could you try to run test-backend-ops (from llama.cpp)? Maybe some specific OPs are not working properly....

Feb 26 '25 22:02 stduhpf

Just noticed that you guys synced ggml last week, which is what I had been waiting for 😄. Now SDXL on Vulkan produces a proper cat that's very similar to the CPU version (albeit not identical):

In the meantime, I had been messing with converting models to onnx. Sd-cpp on Vulkan runs slower/hotter, but is much less RAM-constrained, letting me exceed 1024x1024. And it sure beats running on CPU!

Mar 09 '25 06:03 lostdisc