stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

Vulkan on AMD Ryzen AI APU/iGPU generates worse images than CPU, or just colorful noise

Open lostdisc opened this issue 1 year ago • 5 comments

When I run stable-diffusion.cpp with Vulkan on a Ryzen AI 9 HX 370 (Radeon 890M iGPU), the resulting images are very different from what I get when running on CPU with the AVX2 build. Some comparison pics follow.

SDXL

For reference, the below pic is what I get from SDXL on my CPU if I prompt as follows: sd -m sd_xl_base_1.0.safetensors --vae sdxl.vae.safetensors -H 1024 -W 1024 -p "a lovely cat" (Note that I needed to use madebyollin's fp16 vae to get an output that isn't all black.) sd-cpp-avx2_vae-fp16_cat_1024x1024_output

And below is what I get from SDXL on my GPU using Vulkan: sd -m sd_xl_base_1.0.safetensors --vae sdxl.vae.safetensors --vae-on-cpu -H 1024 -W 1024 -p "a lovely cat" (Note that running VAE on the CPU versus tiled on the GPU produces essentially the same-looking image below. Attempting to run on GPU without tiling fails when it requests an excessive amount of memory, as described in stduhpf's comment here.) sd-cpp-vulkan_vae-fp16-on-cpu_cat_1024x1024_output

SD 1.5

With SD 1.5, Vulkan at least produces actual cat pictures, but they are blurry or deformed compared to CPU.

For reference, below is what I get from the CPU for the following prompt: sd -m v1-5-pruned-emaonly.safetensors -p "a lovely cat" sd-cpp-avx2_cat_output

And below is what I get from the GPU with Vulkan: sd -m v1-5-pruned-emaonly.safetensors -p "a lovely cat" (I also tried running this with the VAE on the CPU, but it gives the same cat below with no apparent visual difference.) sd-cpp-vulkan_cat_output

Finally, running clip on the CPU gives a different, more-deformed cat: sd -m v1-5-pruned-emaonly.safetensors --vae-on-cpu --clip-on-cpu -p "a lovely cat" sd-cpp-vulkan_vae+clip-on-cpu_cat_output

lostdisc avatar Jan 10 '25 03:01 lostdisc

The same issue, with flux, also noise image:

Image

My CPU and GPU: Ryzen AI 9 HX PRO 375 (Radeon 890M iGPU)

cfg_scale: 1 steps; 4

zhycheng614 avatar Feb 26 '25 22:02 zhycheng614

@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR https://github.com/leejet/stable-diffusion.cpp/pull/509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.

stduhpf avatar Feb 26 '25 22:02 stduhpf

@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR #509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.

  1. Yes, llama.cpp vulkan works on my system, can perform correct inference.
  2. On Apple's M1 chip with metal, the same problem: image with noise.
  3. On Apple's M3 Pro chip with metal, can work very well.
  4. On AMD CPU, works very well, high quality image.

zhycheng614 avatar Feb 26 '25 22:02 zhycheng614

@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR #509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.

  1. Yes, llama.cpp vulkan works on my system, can perform correct inference.
  2. On Apple's M1 chip with metal, the same problem: image with noise.
  3. On Apple's M3 Pro chip with metal, can work very well.
  4. On AMD CPU, works very well, high quality image.

If PR #509 doesn't fix it, could you try to run test-backend-ops (from llama.cpp)? Maybe some specific OPs are not working properly....

stduhpf avatar Feb 26 '25 22:02 stduhpf

Just noticed that you guys synced ggml last week, which is what I had been waiting for 😄. Now SDXL on Vulkan produces a proper cat that's very similar to the CPU version (albeit not identical):

Image

In the meantime, I had been messing with converting models to onnx. Sd-cpp on Vulkan runs slower/hotter, but is much less RAM-constrained, letting me exceed 1024x1024. And it sure beats running on CPU!

lostdisc avatar Mar 09 '25 06:03 lostdisc