[Bug] flux compute buffer size much bigger after ggml update
Git commit
b25785b
Operating System & Version
Ubuntu 24.04
GGML backends
Vulkan
Command-line arguments used
./sd -r /home/user/shirt8.png --diffusion-model /home/user/sd.cpp-webui/models/unet/flux1-kontext-dev-Q8_0.gguf -W 736 -H 1024 --lora-model-dir /home/user/sd.cpp-webui/models/loras/ --vae /home/user/sd.cpp-webui/models/vae/ae.safetensors --clip_l /home/user/sd.cpp-webui/models/clip/clip_l.safetensors --t5xxl /home/user/sd.cpp-webui/models/clip/t5xxl_fp16.safetensors -p "lora:tryanything_flux_kontext_lora:1 try on this outfit, man, copy shirt" --cfg-scale 1.0 --sampling-method euler -v --clip-on-cpu -o /home/user/sd.cpp-webui/outputs/imgedit/2.png --vae-on-cpu
Steps to reproduce
All versions prior to commit b25785b which syncs ggml when running the command would set the following amount of vram for buffer - [DEBUG] ggml_extend.hpp:1550 - flux compute buffer size: 2955.44 MB(VRAM). With that commit the amount of vram for buffer jumps - [DEBUG] ggml_extend.hpp:1579 - flux compute buffer size: 7822.92 MB(VRAM) which overflows into GTT memory and becomes super slow.
What you expected to happen
The vram usage to be below 16 GB with [DEBUG] ggml_extend.hpp:1550 - flux compute buffer size: 2955.44 MB(VRAM)
What actually happened
vram usage spiked to 21 GB with [DEBUG] ggml_extend.hpp:1579 - flux compute buffer size: 7822.92 MB(VRAM) on same command
Logs / error messages / stack trace
No response
Additional context / environment details
No response
Can you bisect the ggml commit? or are you using prebuilt binaries?
Does the problem still exist? I had similar problems with another model (a distilled SDXL) but now there are gone :-)
Using the same safetensors file:
current master, Oct 30, 2025
[DEBUG] ggml_extend.hpp:1762 - clip params backend buffer size = 235.06 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1762 - clip params backend buffer size = 1329.29 MB(VRAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1762 - unet params backend buffer size = 3658.05 MB(VRAM) (944 tensors) <----- look here
[DEBUG] ggml_extend.hpp:1762 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
old, Oct 23, 2025
[DEBUG] ggml_extend.hpp:1754 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1754 - clip params backend buffer size = 1329.29 MB(RAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1754 - unet params backend buffer size = 8311.01 MB(RAM) (1680 tensors) <----- look here
[WARN ] stable-diffusion.cpp:510 - No VAE specified with --vae or --force-sdxl-vae-conv-scale flag
[DEBUG] ggml_extend.hpp:1754 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors)
Does the problem still exist?
Maybe not: master-338-faabc5a included a memory-related ggml fix: ggml-org/llama.cpp#16679 .
@evcharger , could you please test again with a more recent release?
I tested with latest commits, but its the same: ggml_extend.hpp:1587 - vae compute buffer size: 2467.97 MB(RAM) ggml_extend.hpp:1587 - flux compute buffer size: 7822.92 MB(VRAM)
With commits from 17.10.25: ggml_extend.hpp:1579 - vae compute buffer size: 1264.33 MB(RAM) ggml_extend.hpp:1579 - flux compute buffer size: 2955.44 MB(VRAM) I also suspect ggml, I will try bisecting next week.