stable-diffusion.cpp ggml error (not enough space in the context's memory pool) with SDXL when large prompt + large negative prompt

I'm trying SDXL (+ fixedVAE) on CPU backend. I encounter an error, but I'm not totally sure what triggers it. With a medium size prompt (~35 words), I succeeded in generating image 1024x1024 without any problem

But when I tried a larger prompt + a large negative prompt, I get this error during final decoding phase :

ggml_new_object: not enough space in the context's memory pool (needed 23568528, available 23068672) D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml.c:1576: GGML_ASSERT(obj_new) failed

I don't understand the message because I've a lot of available RAM during this processing (and the amount of RAM in question here is very small...) The process ends with code -1073740791

Prompt :

((European girl)). (((Late evening summer))), ((((asleep sprawled-out on towel in neighboring suburban garden)))), ((high drone aerial view)), (((40-degrees angle))), ((fence)), (((No Nude))), ((shadowy places)), (((long legs))), ((teeshirt)), (((jeans))), ((wide hips)), ((((short messy dark side-swept pixie-cut hair with severe undercut)))), ((medium breasts)), ((barefoot)), (((sweaty dark skin))), ((spine)), ((logical lighting)), ((logical shadows)), (accurate limbs), ((intricate detail)), (small towel), (headphones and cell phone), book, ((600mm lens)),

Negative prompt :

(worst quality), text, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, bad feet, extra fingers, mutated hands, bad proportions, extra limbs, disfigured, bad anatomy, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, mutated hands, fused fingers, long neck

When I try with only the prompt and no negative prompt, it works, everything is OK. When I try with only the prompt and a small negative prompt (eg. only "bad anatomy"), it works too.

Here is a full log that shows the error (only 2 iterations here, to get the error quickly...) :

D:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\inference_tool_CPU_AVX2_2024_11_30>sd -m "..\StableDiffusion XL 1.0 F16\sd_xl_base_1.0.safetensors" --vae "..\StableDiffusion XL 1.0 F16\sdxl_vae.safetensors" --sampling-method euler --steps 2 --cfg-scale 7.0 -H 1024 -W 1024 -s 42 -t 20 -p "((European girl)). (((Late evening summer))), ((((asleep sprawled-out on towel in neighboring suburban garden)))), ((high drone aerial view)), (((40-degrees angle))), ((fence)), (((No Nude))), ((shadowy places)), (((long legs))), ((teeshirt)), (((jeans))), ((wide hips)), ((((short messy dark side-swept pixie-cut hair with severe undercut)))), ((medium breasts)), ((barefoot)), (((sweaty dark skin))), ((spine)), ((logical lighting)), ((logical shadows)), (accurate limbs), ((intricate detail)), (small towel), (headphones and cell phone), book, ((600mm lens))," -n "(worst quality), text, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, bad feet, extra fingers, mutated hands, bad proportions, extra limbs, disfigured, bad anatomy, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, mutated hands, fused fingers, long neck" -v Option: n_threads: 20 mode: txt2img model_path: ..\StableDiffusion XL 1.0 F16\sd_xl_base_1.0.safetensors wtype: unspecified clip_l_path: clip_g_path: t5xxl_path: diffusion_model_path: vae_path: ..\StableDiffusion XL 1.0 F16\sdxl_vae.safetensors taesd_path: esrgan_path: controlnet_path: embeddings_path: stacked_id_embeddings_path: input_id_images_path: style ratio: 20.00 normalize input image : false output_path: output.png init_img: control_image: clip on cpu: false controlnet cpu: false vae decoder on cpu:false diffusion flash attention:false strength(control): 0.90 prompt: ((European girl)). (((Late evening summer))), ((((asleep sprawled-out on towel in neighboring suburban garden)))), ((high drone aerial view)), (((40-degrees angle))), ((fence)), (((No Nude))), ((shadowy places)), (((long legs))), ((teeshirt)), (((jeans))), ((wide hips)), ((((short messy dark side-swept pixie-cut hair with severe undercut)))), ((medium breasts)), ((barefoot)), (((sweaty dark skin))), ((spine)), ((logical lighting)), ((logical shadows)), (accurate limbs), ((intricate detail)), (small towel), (headphones and cell phone), book, ((600mm lens)), negative_prompt: (worst quality), text, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, bad feet, extra fingers, mutated hands, bad proportions, extra limbs, disfigured, bad anatomy, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, mutated hands, fused fingers, long neck min_cfg: 1.00 cfg_scale: 7.00 slg_scale: 0.00 guidance: 3.50 clip_skip: -1 width: 1024 height: 1024 sample_method: euler schedule: default sample_steps: 2 strength(img2img): 0.75 rng: cuda seed: 42 batch_count: 1 vae_tiling: false upscale_repeats: 1 System Info: SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:182 - Using CPU backend [INFO ] stable-diffusion.cpp:191 - loading model from '..\StableDiffusion XL 1.0 F16\sd_xl_base_1.0.safetensors' [INFO ] model.cpp:888 - load ..\StableDiffusion XL 1.0 F16\sd_xl_base_1.0.safetensors using safetensors format [DEBUG] model.cpp:959 - init from '..\StableDiffusion XL 1.0 F16\sd_xl_base_1.0.safetensors' [INFO ] stable-diffusion.cpp:226 - loading vae from '..\StableDiffusion XL 1.0 F16\sdxl_vae.safetensors' [INFO ] model.cpp:888 - load ..\StableDiffusion XL 1.0 F16\sdxl_vae.safetensors using safetensors format [DEBUG] model.cpp:959 - init from '..\StableDiffusion XL 1.0 F16\sdxl_vae.safetensors' [INFO ] stable-diffusion.cpp:238 - Version: SDXL [INFO ] stable-diffusion.cpp:271 - Weight type: f16 [INFO ] stable-diffusion.cpp:272 - Conditioner weight type: f16 [INFO ] stable-diffusion.cpp:273 - Diffusion model weight type: f16 [INFO ] stable-diffusion.cpp:274 - VAE weight type: f32 [DEBUG] stable-diffusion.cpp:276 - ggml tensor size = 400 bytes [DEBUG] clip.hpp:171 - vocab size: 49408 [DEBUG] clip.hpp:182 - trigger word img already in vocab [DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size = 469.44 MB(RAM) (196 tensors) [DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size = 2649.92 MB(RAM) (517 tensors) [DEBUG] ggml_extend.hpp:1075 - unet params backend buffer size = 4900.07 MB(RAM) (1680 tensors) [DEBUG] ggml_extend.hpp:1075 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors) [DEBUG] stable-diffusion.cpp:413 - loading weights [DEBUG] model.cpp:1645 - loading tensors from ..\StableDiffusion XL 1.0 F16\sd_xl_base_1.0.safetensors [DEBUG] model.cpp:1645 - loading tensors from ..\StableDiffusion XL 1.0 F16\sdxl_vae.safetensors [INFO ] stable-diffusion.cpp:512 - total params memory size = 8113.89MB (VRAM 0.00MB, RAM 8113.89MB): clip 3119.36MB(RAM), unet 4900.07MB(RAM), vae 94.47MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM) [INFO ] stable-diffusion.cpp:516 - loading model from '..\StableDiffusion XL 1.0 F16\sd_xl_base_1.0.safetensors' completed, taking 17.31s [INFO ] stable-diffusion.cpp:546 - running in eps-prediction mode [DEBUG] stable-diffusion.cpp:590 - finished loaded file [DEBUG] stable-diffusion.cpp:1464 - txt2img 1024x1024 [DEBUG] stable-diffusion.cpp:1194 - prompt after extract and remove lora: "((European girl)). (((Late evening summer))), ((((asleep sprawled-out on towel in neighboring suburban garden)))), ((high drone aerial view)), (((40-degrees angle))), ((fence)), (((No Nude))), ((shadowy places)), (((long legs))), ((teeshirt)), (((jeans))), ((wide hips)), ((((short messy dark side-swept pixie-cut hair with severe undercut)))), ((medium breasts)), ((barefoot)), (((sweaty dark skin))), ((spine)), ((logical lighting)), ((logical shadows)), (accurate limbs), ((intricate detail)), (small towel), (headphones and cell phone), book, ((600mm lens))," [INFO ] stable-diffusion.cpp:673 - Attempting to apply 0 LoRAs [INFO ] stable-diffusion.cpp:1199 - apply_loras completed, taking 0.00s [DEBUG] conditioner.hpp:329 - parse '((European girl)). (((Late evening summer))), ((((asleep sprawled-out on towel in neighboring suburban garden)))), ((high drone aerial view)), (((40-degrees angle))), ((fence)), (((No Nude))), ((shadowy places)), (((long legs))), ((teeshirt)), (((jeans))), ((wide hips)), ((((short messy dark side-swept pixie-cut hair with severe undercut)))), ((medium breasts)), ((barefoot)), (((sweaty dark skin))), ((spine)), ((logical lighting)), ((logical shadows)), (accurate limbs), ((intricate detail)), (small towel), (headphones and cell phone), book, ((600mm lens)),' to [['European girl', 1.21], ['. ', 1], ['Late evening summer', 1.331], [', ', 1], ['asleep sprawled-out on towel in neighboring suburban garden', 1.4641], [', ', 1], ['high drone aerial view', 1.21], [', ', 1], ['40-degrees angle', 1.331], [', ', 1], ['fence', 1.21], [', ', 1], ['No Nude', 1.331], [', ', 1], ['shadowy places', 1.21], [', ', 1], ['long legs', 1.331], [', ', 1], ['teeshirt', 1.21], [', ', 1], ['jeans', 1.331], [DEBUG] clip.hpp:311 - token length: 154 [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM) [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM) [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM) [DEBUG] conditioner.hpp:457 - computing condition graph completed, taking 1330 ms [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM) [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM) [DEBUG] conditioner.hpp:457 - computing condition graph completed, taking 1993 ms [DEBUG] conditioner.hpp:329 - parse '(worst quality), text, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, bad feet, extra fingers, mutated hands, bad proportions, extra limbs, disfigured, bad anatomy, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, mutated hands, fused fingers, long neck' to [['worst quality', 1.1], [', text, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, bad feet, extra fingers, mutated hands, bad proportions, extra limbs, disfigured, bad anatomy, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, mutated hands, fused fingers, long neck', 1], ] [DEBUG] clip.hpp:311 - token length: 154 [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM) [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM) [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM) [DEBUG] conditioner.hpp:457 - computing condition graph completed, taking 1334 ms [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM) [DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 2.33 MB(RAM) [DEBUG] conditioner.hpp:457 - computing condition graph completed, taking 2013 ms [INFO ] stable-diffusion.cpp:1332 - get_learned_condition completed, taking 4119 ms [INFO ] stable-diffusion.cpp:1355 - sampling using Euler method [INFO ] stable-diffusion.cpp:1359 - generating image: 1/1 - seed 42 [DEBUG] ggml_extend.hpp:1026 - unet compute buffer size: 879.25 MB(RAM) |==================================================| 2/2 - 88.88s/it [INFO ] stable-diffusion.cpp:1395 - sampling completed, taking 180.08s [INFO ] stable-diffusion.cpp:1403 - generating 1 latent images completed, taking 180.75s [INFO ] stable-diffusion.cpp:1406 - decoding 1 latents ggml_new_object: not enough space in the context's memory pool (needed 23566736, available 23068672) D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml.c:1576: GGML_ASSERT(obj_new) failed

Jan 10 '25 14:01 olivbrau

I tested the same prompt and can confirm the error happens to me too, FWIW. Trying VAE tiling did not help; it fails in the same way before ever getting to the tiling part.

Jan 10 '25 16:01 lostdisc

I've tried with a slightly larger negative prompt, and it worked. So I think there is a kind of limit in the nb token before the bug arise.

Jan 12 '25 13:01 olivbrau

I think it is not related to prompts at all, but to available memory during decoding. I got similar issue like:

[INFO ] stable-diffusion.cpp:554  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:688  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1241 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1374 - get_learned_condition completed, taking 226 ms
[INFO ] stable-diffusion.cpp:1397 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1434 - generating image: 1/1 - seed 42
  |==================================================| 20/20 - 1.92it/s
[INFO ] stable-diffusion.cpp:1472 - sampling completed, taking 11.05s
[INFO ] stable-diffusion.cpp:1480 - generating 1 latent images completed, taking 11.08s
[INFO ] stable-diffusion.cpp:1483 - decoding 1 latents
ggml_new_object: not enough space in the context's memory pool (needed 19176000, available 19136512)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml.c:1576: GGML_ASSERT(obj_new) failed

When I had large size like -W 1216 -H 832, It did not work. When I reduced the image size to -W 960 -H 640 I was able to generate the image. I have like RTX 3090, some of the memory is used by the monitor. When I check the utilized memory, I can see that the size of available on the GPU matches.

Jan 27 '25 21:01 Johnz86

I think the problem is not the amount of vram you actually have available. Rather it's just that not enough memory is allocated when creating the ggml_context. This is a software bug. The ugly fix is just to increase the size of the context to a bigger number in the code, but maybe there's a more elegant solution.

Jan 27 '25 22:01 stduhpf

In my example, I used CPU backend. My computer has 32 GB RAM so I think it is not a lack of memory

Jan 28 '25 18:01 olivbrau

I tried on my laptop with NVIDIA GeForce RTX 4060, which has far less memory, then my RTX 3090 and I was able to generate images in resolution 1216x832. I made multiple generations with the prompts from this issue. I had failure when, I included the entire negative, but when I remove just one tag 'log neck' from negative, then I created a picture. It worked also with the entire negative when I lowered the resolution.

Jan 30 '25 13:01 JanciJakGenerAIMage

Hi All,

First of all, I would like to say thanks for the SD.cpp team. It is really awesome project and save my Mac for SD.

I am not good at C++, but Reg this issue, you can have a quick fix on the following and rebuild.

find the stable-diffusion.cpp and go to line of 1553 and add the following code:

if (sd_version_is_sdxl(sd_ctx->sd->version)) { params.mem_size *= 4; }

the root cause of the issue, is the ctx did not have allocated enough memory. I am not good at to calculate the accurate size needed, So the above code just using the size from Flux.

Feb 04 '25 21:02 enjoyinggreen

Hi All,

First of all, I would like to say thanks for the SD.cpp team. It is really awesome project and save my Mac for SD.

I am not good at C++, but Reg this issue, you can have a quick fix on the following and rebuild.

find the stable-diffusion.cpp and go to line of 1553 and add the following code:

if (sd_version_is_sdxl(sd_ctx->sd->version)) { params.mem_size *= 4; }

the root cause of the issue, is the ctx did not have allocated enough memory. I am not good at to calculate the accurate size needed, So the above code just using the size from Flux.

Yes, something like that works, but it's not really satisfying. I think the "right" way to do it would be to build the graph before allocating the context memory to know the required size? I think that's how it's done in llama.cpp, but I'm not entirely sure.

Feb 05 '25 17:02 stduhpf

This happened to me when using Photomaker. params.mem_size *= 4; kind of fixed it, but not obviously correct way to go

Feb 13 '25 18:02 mpulukkinen