Vulkan on AMD Ryzen AI APU/iGPU generates worse images than CPU, or just colorful noise
When I run stable-diffusion.cpp with Vulkan on a Ryzen AI 9 HX 370 (Radeon 890M iGPU), the resulting images are very different from what I get when running on CPU with the AVX2 build. Some comparison pics follow.
SDXL
For reference, the below pic is what I get from SDXL on my CPU if I prompt as follows:
sd -m sd_xl_base_1.0.safetensors --vae sdxl.vae.safetensors -H 1024 -W 1024 -p "a lovely cat"
(Note that I needed to use madebyollin's fp16 vae to get an output that isn't all black.)
And below is what I get from SDXL on my GPU using Vulkan:
sd -m sd_xl_base_1.0.safetensors --vae sdxl.vae.safetensors --vae-on-cpu -H 1024 -W 1024 -p "a lovely cat"
(Note that running VAE on the CPU versus tiled on the GPU produces essentially the same-looking image below. Attempting to run on GPU without tiling fails when it requests an excessive amount of memory, as described in stduhpf's comment here.)
SD 1.5
With SD 1.5, Vulkan at least produces actual cat pictures, but they are blurry or deformed compared to CPU.
For reference, below is what I get from the CPU for the following prompt:
sd -m v1-5-pruned-emaonly.safetensors -p "a lovely cat"
And below is what I get from the GPU with Vulkan:
sd -m v1-5-pruned-emaonly.safetensors -p "a lovely cat"
(I also tried running this with the VAE on the CPU, but it gives the same cat below with no apparent visual difference.)
Finally, running clip on the CPU gives a different, more-deformed cat:
sd -m v1-5-pruned-emaonly.safetensors --vae-on-cpu --clip-on-cpu -p "a lovely cat"
The same issue, with flux, also noise image:
My CPU and GPU: Ryzen AI 9 HX PRO 375 (Radeon 890M iGPU)
cfg_scale: 1 steps; 4
@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR https://github.com/leejet/stable-diffusion.cpp/pull/509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.
@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR #509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.
- Yes, llama.cpp vulkan works on my system, can perform correct inference.
- On Apple's M1 chip with metal, the same problem: image with noise.
- On Apple's M3 Pro chip with metal, can work very well.
- On AMD CPU, works very well, high quality image.
@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR #509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.
- Yes, llama.cpp vulkan works on my system, can perform correct inference.
- On Apple's M1 chip with metal, the same problem: image with noise.
- On Apple's M3 Pro chip with metal, can work very well.
- On AMD CPU, works very well, high quality image.
If PR #509 doesn't fix it, could you try to run test-backend-ops (from llama.cpp)? Maybe some specific OPs are not working properly....
Just noticed that you guys synced ggml last week, which is what I had been waiting for 😄. Now SDXL on Vulkan produces a proper cat that's very similar to the CPU version (albeit not identical):
In the meantime, I had been messing with converting models to onnx. Sd-cpp on Vulkan runs slower/hotter, but is much less RAM-constrained, letting me exceed 1024x1024. And it sure beats running on CPU!