ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1178743392, available 1106773248)
Hello, I'm trying to run this on Windows 11, and after fixing the cmake errors with #106 by adding the definitions, when I try to load the 30B model I get a lot of title errors followed by a Segmentation fault:
$ ./Release/chat.exe -m ggml-model-q4_0.bin
main: seed = 1679560202
llama_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 1055.50 MB
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1120528208, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1120528208, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1120528208, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1129074932, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1129074932, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1175959936, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1175959936, available 1106773248)
[...many many lines more...]
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1131858388, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1131858388, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1178743392, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1178743392, available 1106773248)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1178743392, available 1106773248)
ggml_new_tensor_impl: not enough space iSegmentation fault
I have 64Gb of memory and 53Gb are free when I try, so I'm guessing it should be enough? I see the same issue popping for someone in llama https://github.com/ggerganov/llama.cpp/issues/153 so it might be related.
I have also tested with 13B model and it won't load either:
$ ./Release/chat.exe -m ggml-alpaca-13b-q4.bin
main: seed = 1679561627
llama_model_load: loading model from 'ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 2767.49 MB
Segmentation fault
So I guess it's not a matter of me lacking memory:

Ya, I'm seeing a repeatable crash similar to this after about 10 prompts complete. I'm using the 30B model with default parameters. This is on a PC with 128GB of RAM. So it should not be having out of memory issues.
Is there param that increases the available memory pool?
I have the same problem. This has been addressed in the Llama model but no patch is available yet (read the last comment from @ggerganov)
https://github.com/ggerganov/llama.cpp/issues/599
This should be resolved by #626 .
This should be resolved by ggerganov#626 .
https://github.com/ggerganov/llama.cpp/commit/c0bb1d3ce21005ab21d686626ba87261a6e3a660
Here is his fix in the llama.cpp code. Looks easy enough of a fix for alpaca.cpp. I believe the code bases are still similar enough.
Agreed, should be rather easy to merge, I have no working alpaca.cpp repo though, so I'd prefer if someone else does this. I can also do it though and someone else tests, feel free to ping me if needed.
I just tried merging these specific changed and it created an assert in another part of the code when the model loaded.