Flash attention abort trap
I'm having this issue on Mac when using sd with Flash Attention enabled:
[DEBUG] ggml_extend.hpp:599 - clip compute buffer size: 9.88 MB
[DEBUG] stable-diffusion.cpp:441 - computing condition graph completed, taking 631 ms
[INFO ] stable-diffusion.cpp:1221 - get_learned_condition completed, taking 1570 ms
[INFO ] stable-diffusion.cpp:1231 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1235 - generating image: 1/1 - seed 485932809
[DEBUG] ggml_extend.hpp:599 - unet compute buffer size: 728.52 MB
|> | 0/8 - 0.00it/sGGML_ASSERT: ~/stable-diffusion.cpp/ggml/src/ggml.c:12401: P >= 0
GGML_ASSERT: ~/stable-diffusion.cpp/ggml/src/ggml.c:12401: P >= 0
GGML_ASSERT: ~/stable-diffusion.cpp/ggml/src/ggml.c:12401: P >= 0
Abort trap: 6
Here how I build the binary:
mkdir build
cd build
cmake .. -DSD_FLASH_ATTN=ON
cmake --build . --config Release --clean-first
Here a test to reproduce it:
./bin/sd \
-m ../../data/downloaded/models/realvisxlV30Turbo_v30TurboBakedvae.safetensors \
--vae ../../data/downloaded/vae/sdxl_vae.safetensors \
-p "Photo of a girl,cinematic film still,super saiyan, full plate armor" \
-n "ugly, deformed, noisy, blurry, low contrast, text, 3d, cgi, render, fake, anime, open mouth, big forehead, long neck" \
-o ../../data/samples/output/test001_sdxl.png \
--steps 8 --cfg-scale 2.0 -s 1850492235 -v -H 1536 -W 1152
the checkpoint used for it --> https://civitai.com/models/139562/realvisxl-v30-turbo
The result should be something like this:
Would you encounter this issue when not using flash attention?
@leejet To fix this need this modification in ggml.c line 12401:
if(dst->op_params[0] == 1) { // masked
GGML_ASSERT(P >= 0);
}
@leejet To fix this need this modification in ggml.c line 12401:
if(dst->op_params[0] == 1) { // masked GGML_ASSERT(P >= 0); }
I can confirm the bug also affects me on Linux and this change makes it generate an image.
@leejet To fix this need this modification in ggml.c line 12401:
if(dst->op_params[0] == 1) { // masked GGML_ASSERT(P >= 0); }
It looks like the upstream ggml hasn't fixed this issue yet.
@leejet The GGUF file support is broken, you need to set GGML_MAX_NAME to 128 to prevent it from crashing when loading the model.
@leejet The GGUF file support is broken, you need to set GGML_MAX_NAME to 128 to prevent it from crashing when loading the model.
I overlooked this when updating GGML. I've made the necessary changes now.
Would you encounter this issue when not using flash attention?
Nope, works fine with the standard cmake ..
I think it broke again or is still broken:
ggml/src/ggml.c:12743: P >= 0
GGML_ASSERT: /home/h3ndrik/tmp/stable-diffusion.cpp/ggml/src/ggml.c:12743: P >= 0
Aborted
I can confirm as well that I'm encountering the same error with flash attention enabled both on Windows and Linux. Without it, everything works fine.
GGML_ASSERT: /data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:12743: P >= 0 Aborted
I see the same issue with flash attention. I followed the steps described in the readme, but with flash attention enabled, I always get this error when trying to generate an image:
GGML_ASSERT: C:\dev\stable-diffusion.cpp\ggml\src\ggml.c:13829: P >= 0 GGML_ASSERT: C:\dev\stable-diffusion.cpp\ggml\src\ggml.c:13829: P >= 0
The solution described above (wrapping the GGML_ASSERT(P >= 0); in that line of code in an if) works to fix the issue.
GGML_ASSERT(P >= 0)
I encountered that too.
The solution described above (wrapping the GGML_ASSERT(P >= 0); in that line of code in an if) works to fix the issue.
@JohnAlcatraz If you really do so, you will get a very, very bad quality, according to my experiments before.
@JohnAlcatraz If you really do so, you will get a very, very bad quality, according to my experiments before.
I also see very bad quality with flash attention, but that is just because Flash attention is not following the prompt at all. That is a bug unrelated to this issue I think. I opened a new issue about that: https://github.com/leejet/stable-diffusion.cpp/issues/259
I also see very bad quality with flash attention, but that is just because Flash attention is not following the prompt at all. That is a bug unrelated to this issue I think. I opened a new issue about that: #259
Oh I know, thx for your explaination (