alpaca.cpp ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1178743392, available 1106773248)

Hello, I'm trying to run this on Windows 11, and after fixing the cmake errors with #106 by adding the definitions, when I try to load the 30B model I get a lot of title errors followed by a Segmentation fault:

$ ./Release/chat.exe -m ggml-model-q4_0.bin
main: seed = 1679560202
llama_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 1055.50 MB
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1120528208, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1120528208, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1120528208, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1129074932, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1129074932, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1175959936, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1175959936, available 1106773248)
[...many many lines more...]
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1131858388, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1131858388, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1178743392, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1178743392, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1178743392, available 1106773248)
ggml_new_tensor_impl: not enough space iSegmentation fault

I have 64Gb of memory and 53Gb are free when I try, so I'm guessing it should be enough? I see the same issue popping for someone in llama https://github.com/ggerganov/llama.cpp/issues/153 so it might be related.

I have also tested with 13B model and it won't load either:

$ ./Release/chat.exe -m ggml-alpaca-13b-q4.bin
main: seed = 1679561627
llama_model_load: loading model from 'ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 2767.49 MB
Segmentation fault

So I guess it's not a matter of me lacking memory:

Mar 23 '23 08:03 Dannythm

Ya, I'm seeing a repeatable crash similar to this after about 10 prompts complete. I'm using the 30B model with default parameters. This is on a PC with 128GB of RAM. So it should not be having out of memory issues.

Is there param that increases the available memory pool?

Mar 29 '23 00:03 Castaa

I have the same problem. This has been addressed in the Llama model but no patch is available yet (read the last comment from @ggerganov)

https://github.com/ggerganov/llama.cpp/issues/599

Apr 02 '23 10:04 dgasparri

This should be resolved by #626 .

Apr 02 '23 10:04 Seltsamsel

This should be resolved by ggerganov#626 .

https://github.com/ggerganov/llama.cpp/commit/c0bb1d3ce21005ab21d686626ba87261a6e3a660

Here is his fix in the llama.cpp code. Looks easy enough of a fix for alpaca.cpp. I believe the code bases are still similar enough.

Apr 02 '23 17:04 Castaa

Agreed, should be rather easy to merge, I have no working alpaca.cpp repo though, so I'd prefer if someone else does this. I can also do it though and someone else tests, feel free to ping me if needed.

Apr 02 '23 17:04 Seltsamsel

I just tried merging these specific changed and it created an assert in another part of the code when the model loaded.

Apr 03 '23 00:04 Castaa