stable-diffusion.cpp [Bug] flux compute buffer size much bigger after ggml update

Git commit

b25785b

Operating System & Version

Ubuntu 24.04

GGML backends

Vulkan

Command-line arguments used

./sd -r /home/user/shirt8.png --diffusion-model /home/user/sd.cpp-webui/models/unet/flux1-kontext-dev-Q8_0.gguf -W 736 -H 1024 --lora-model-dir /home/user/sd.cpp-webui/models/loras/ --vae /home/user/sd.cpp-webui/models/vae/ae.safetensors --clip_l /home/user/sd.cpp-webui/models/clip/clip_l.safetensors --t5xxl /home/user/sd.cpp-webui/models/clip/t5xxl_fp16.safetensors -p "lora:tryanything_flux_kontext_lora:1 try on this outfit, man, copy shirt" --cfg-scale 1.0 --sampling-method euler -v --clip-on-cpu -o /home/user/sd.cpp-webui/outputs/imgedit/2.png --vae-on-cpu

Steps to reproduce

All versions prior to commit b25785b which syncs ggml when running the command would set the following amount of vram for buffer - [DEBUG] ggml_extend.hpp:1550 - flux compute buffer size: 2955.44 MB(VRAM). With that commit the amount of vram for buffer jumps - [DEBUG] ggml_extend.hpp:1579 - flux compute buffer size: 7822.92 MB(VRAM) which overflows into GTT memory and becomes super slow.

What you expected to happen

The vram usage to be below 16 GB with [DEBUG] ggml_extend.hpp:1550 - flux compute buffer size: 2955.44 MB(VRAM)

What actually happened

vram usage spiked to 21 GB with [DEBUG] ggml_extend.hpp:1579 - flux compute buffer size: 7822.92 MB(VRAM) on same command

Logs / error messages / stack trace

No response

Additional context / environment details

No response

Oct 21 '25 20:10 evcharger

Can you bisect the ggml commit? or are you using prebuilt binaries?

Oct 22 '25 10:10 Green-Sky

Does the problem still exist? I had similar problems with another model (a distilled SDXL) but now there are gone :-)

Using the same safetensors file: current master, Oct 30, 2025
[DEBUG] ggml_extend.hpp:1762 - clip params backend buffer size = 235.06 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1762 - clip params backend buffer size = 1329.29 MB(VRAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1762 - unet params backend buffer size = 3658.05 MB(VRAM) (944 tensors) <----- look here [DEBUG] ggml_extend.hpp:1762 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)

old, Oct 23, 2025
[DEBUG] ggml_extend.hpp:1754 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1754 - clip params backend buffer size = 1329.29 MB(RAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1754 - unet params backend buffer size = 8311.01 MB(RAM) (1680 tensors) <----- look here [WARN ] stable-diffusion.cpp:510 - No VAE specified with --vae or --force-sdxl-vae-conv-scale flag [DEBUG] ggml_extend.hpp:1754 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors)

Oct 30 '25 10:10 akleine

Does the problem still exist?

Maybe not: master-338-faabc5a included a memory-related ggml fix: ggml-org/llama.cpp#16679 .

@evcharger , could you please test again with a more recent release?

Oct 30 '25 10:10 wbruna

I tested with latest commits, but its the same: ggml_extend.hpp:1587 - vae compute buffer size: 2467.97 MB(RAM) ggml_extend.hpp:1587 - flux compute buffer size: 7822.92 MB(VRAM)

With commits from 17.10.25: ggml_extend.hpp:1579 - vae compute buffer size: 1264.33 MB(RAM) ggml_extend.hpp:1579 - flux compute buffer size: 2955.44 MB(VRAM) I also suspect ggml, I will try bisecting next week.

Oct 31 '25 10:10 evcharger