stable-diffusion.cpp Flash attention abort trap

I'm having this issue on Mac when using sd with Flash Attention enabled:

[DEBUG] ggml_extend.hpp:599  - clip compute buffer size: 9.88 MB
[DEBUG] stable-diffusion.cpp:441  - computing condition graph completed, taking 631 ms
[INFO ] stable-diffusion.cpp:1221 - get_learned_condition completed, taking 1570 ms
[INFO ] stable-diffusion.cpp:1231 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1235 - generating image: 1/1 - seed 485932809
[DEBUG] ggml_extend.hpp:599  - unet compute buffer size: 728.52 MB
  |>                                                 | 0/8 - 0.00it/sGGML_ASSERT: ~/stable-diffusion.cpp/ggml/src/ggml.c:12401: P >= 0
GGML_ASSERT: ~/stable-diffusion.cpp/ggml/src/ggml.c:12401: P >= 0
GGML_ASSERT: ~/stable-diffusion.cpp/ggml/src/ggml.c:12401: P >= 0
Abort trap: 6

Here how I build the binary:

mkdir build
cd build
cmake .. -DSD_FLASH_ATTN=ON
cmake --build . --config Release --clean-first

Here a test to reproduce it:

./bin/sd \
-m ../../data/downloaded/models/realvisxlV30Turbo_v30TurboBakedvae.safetensors \
--vae ../../data/downloaded/vae/sdxl_vae.safetensors \
-p "Photo of a girl,cinematic film still,super saiyan, full plate armor" \
-n "ugly, deformed, noisy, blurry, low contrast, text, 3d, cgi, render, fake, anime, open mouth, big forehead, long neck" \
-o ../../data/samples/output/test001_sdxl.png \
--steps 8 --cfg-scale 2.0 -s 1850492235 -v -H 1536 -W 1152

the checkpoint used for it --> https://civitai.com/models/139562/realvisxl-v30-turbo

The result should be something like this: 31073426-1850492235-Photo of a girl,cinematic film still,super saiyan, full plate armor

Jan 01 '24 20:01 10undertiber

Would you encounter this issue when not using flash attention?

Jan 02 '24 13:01 leejet

@leejet To fix this need this modification in ggml.c line 12401:


if(dst->op_params[0] == 1) { // masked
        GGML_ASSERT(P >= 0);
    }

Jan 02 '24 19:01 FSSRepo

@leejet To fix this need this modification in ggml.c line 12401:
if(dst->op_params[0] == 1) { // masked
        GGML_ASSERT(P >= 0);
    }

I can confirm the bug also affects me on Linux and this change makes it generate an image.

Jan 03 '24 11:01 h3ndrik

@leejet To fix this need this modification in ggml.c line 12401:
if(dst->op_params[0] == 1) { // masked
        GGML_ASSERT(P >= 0);
    }

It looks like the upstream ggml hasn't fixed this issue yet.

Jan 03 '24 14:01 leejet

@leejet The GGUF file support is broken, you need to set GGML_MAX_NAME to 128 to prevent it from crashing when loading the model.

Jan 03 '24 14:01 FSSRepo

@leejet The GGUF file support is broken, you need to set GGML_MAX_NAME to 128 to prevent it from crashing when loading the model.

I overlooked this when updating GGML. I've made the necessary changes now.

Jan 03 '24 14:01 leejet

Would you encounter this issue when not using flash attention?

Nope, works fine with the standard cmake ..

Jan 03 '24 16:01 10undertiber

I think it broke again or is still broken:

ggml/src/ggml.c:12743: P >= 0
GGML_ASSERT: /home/h3ndrik/tmp/stable-diffusion.cpp/ggml/src/ggml.c:12743: P >= 0
Aborted

Jan 08 '24 00:01 h3ndrik

I can confirm as well that I'm encountering the same error with flash attention enabled both on Windows and Linux. Without it, everything works fine.

GGML_ASSERT: /data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:12743: P >= 0 Aborted

Jan 09 '24 07:01 rmatif

I see the same issue with flash attention. I followed the steps described in the readme, but with flash attention enabled, I always get this error when trying to generate an image:

GGML_ASSERT: C:\dev\stable-diffusion.cpp\ggml\src\ggml.c:13829: P >= 0 GGML_ASSERT: C:\dev\stable-diffusion.cpp\ggml\src\ggml.c:13829: P >= 0

The solution described above (wrapping the GGML_ASSERT(P >= 0); in that line of code in an if) works to fix the issue.

May 11 '24 21:05 JohnAlcatraz

GGML_ASSERT(P >= 0)

I encountered that too.

The solution described above (wrapping the GGML_ASSERT(P >= 0); in that line of code in an if) works to fix the issue.

@JohnAlcatraz If you really do so, you will get a very, very bad quality, according to my experiments before.

May 12 '24 04:05 mzwing

@JohnAlcatraz If you really do so, you will get a very, very bad quality, according to my experiments before.

I also see very bad quality with flash attention, but that is just because Flash attention is not following the prompt at all. That is a bug unrelated to this issue I think. I opened a new issue about that: https://github.com/leejet/stable-diffusion.cpp/issues/259

May 12 '24 13:05 JohnAlcatraz

I also see very bad quality with flash attention, but that is just because Flash attention is not following the prompt at all. That is a bug unrelated to this issue I think. I opened a new issue about that: #259

Oh I know, thx for your explaination (

May 13 '24 01:05 mzwing