Metal text2img crashes

Open xiaogz opened this issue 1 year ago • 0 comments

On my Mac Studio M2 Ultra, building and running with Metal always crashes. If I run using lldb then there is a chance that I get an output but it can still crash. I'm currently following this guide and using the default cat prompt on leejet's q4_k and q2_k flux schnell model. Same behaviour for his q2_k model. The guide's link to the vae safetensor is inaccessible for me as I'm not part of flux-dev but I used the official black-forest-labs vae matrix.

crash-metal-raw.txt

...
[INFO ] stable-diffusion.cpp:1236 - get_learned_condition completed, taking 3080 ms
[INFO ] stable-diffusion.cpp:1259 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1263 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:980  - flux compute buffer size: 398.50 MB(VRAM)
zsh: segmentation fault  ./build-metal/bin/sd --vae  --clip_l  --t5xxl  -p  --cfg-scale 1.0  euler -v

crash-metal-lldb.txt:

...
INFO ] stable-diffusion.cpp:1236 - get_learned_condition completed, taking 3084 ms
[INFO ] stable-diffusion.cpp:1259 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1263 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:980  - flux compute buffer size: 398.50 MB(VRAM)
Process 41298 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=257, address=0x9b037b0376037003)
    frame #0: 0x00007b0376037003
error: memory read failed for 0x7b0376037000
Target 0: (sd) stopped.
...
(lldb) fr v
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=257, address=0x9b037b0376037003)
  * frame #0: 0x00007b0376037003
    frame #1: 0x000000010015ae5c sd`ggml_metal_graph_compute(ctx=0x000000014080ea00, gf=<unavailable>) at ggml-metal.m:2870:36 [opt]
    frame #2: 0x0000000100137ec4 sd`ggml_backend_graph_compute [inlined] ggml_backend_graph_compute_async(backend=0x0000600000d4c370, cgraph=<unavailable>) at ggml-backend.c:282:12 [opt]
    frame #3: 0x0000000100137ebc sd`ggml_backend_graph_compute(backend=0x0000600000d4c370, cgraph=<unavailable>) at ggml-backend.c:276:28 [opt]
    frame #4: 0x0000000100081078 sd`GGMLRunner::compute(this=0x000000013ff07960, get_graph=<unavailable>, n_threads=16, free_compute_buffer_immediately=false, output=0x000000016fdfd568, output_ctx=0x0000000000000000) at ggml_extend.hpp:1095:9 [opt]
...

Hope the stack trace helps in fixing this issue! Thanks for making SD run locally!

EDIT: More information:

I'm on commit 8847114abfd900898e78d0257f5f9086f2473601

Date:   Sun Aug 25 22:39:39 2024 +0800

    fix: fix issue when applying lora

I built stable-diffusion.cpp with: cmake -G Ninja -DSD_METAL=ON -DCMAKE_BUILD_TYPE="RelWithDebInfo" .. && cmake --build . (default release and debug can also repro the crash)
Ran with the sample guide commands: ./bin/sd --vae ~/work/models/stable-diffusion/diffusion_pytorch_model.safetensors --clip_l ~/work/models/stable-diffusion/clip_l.safetensors --t5xxl ~/work/models/stable-diffusion/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v --diffusion-model ~/work/models/stable-diffusion/flux1-schnell-q4_k.gguf
my machine: Mac Studio M2 Ultra with 24 CPU cores and 64GB unified ram on Sonoma 14.6.1 (23G93)

Aug 27 '24 19:08 xiaogz