chatglm.cpp max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错: ggml_new_tensor_impl: not enough space in the scratch memory pool

首先大赞本项目的推理加速效果！666！

环境Linux py38 我在使用python 绑定编译后的chatglm.cpp 模块后，使用q4_0量化chatglm2-6b 推理设置 generation_kwargs = dict( max_length=6000, max_context_length=2400, do_sample=args.temp > 0, top_k=args.top_k, top_p=args.top_p, temperature=args.temp, repetition_penalty=args.repeat_penalty, stream=True, ) 设置max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错: ggml_new_tensor_impl: not enough space in the scratch memory pool 这个问题貌似不少llama.cpp用户遇到过 google一搜遍地开花

我在google搜 llama-cpp-python 是有类似的issue
https://github.com/abetlen/llama-cpp-python/issues/356 https://github.com/abetlen/llama-cpp-python/issues/356#issuecomment-1585744322 提到是llama.cpp内存泄漏

llama.cpp 项目里面也有这个bug 有人回滚了版本貌似解决了 llama_cpp_python回滚到0.1.74 https://github.com/ggerganov/llama.cpp/issues/29#issuecomment-1703954380 https://github.com/ggerganov/llama.cpp/issues/2404#issuecomment-1652223140

使用本项目跑推理时遇到这个bug怎么解决呢？谢谢

Oct 08 '23 17:10 valkryhx

same issues: https://github.com/li-plus/chatglm.cpp/issues/131

Oct 23 '23 03:10 trekrollercoaster

遇到了相同的问题：

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 1357824000, available 1342177280)

而且对于我这里，并不是2048这个确切的数，而是还要大一些我的模型是chatglm3-6b-32k q5_1 量化话说这里的1342177280指的是什么？CUDA的显存还是内存？

Dec 13 '23 15:12 ISNing

同样遇到这个问题大家有解决的么？

Jan 27 '24 09:01 Pan06da

已解决：需要修改空间分配的这几个值，同时保证max_context_length 和 max_tokens 不会占用超过内存设置的值，否则程序会崩溃

I've figure it out, you need to change those memory Settings in chatglm.h. Also make sure "max_context_length" and "max_tokens" do not occupy more than than memory value

https://github.com/li-plus/chatglm.cpp/blob/main/chatglm.h#L1019-L1020

Feb 07 '24 15:02 VaalaCat

在 #305 修复了，最新版本 (v0.4.0) 会按需进行内存分配，不再需要预设 scratch size / memory size，只要设备内存足够就可以推理长文。

Jun 21 '24 03:06 li-plus

max_context_length > 2048 (比如langchain 场景下很长的上下文)时 报错: ggml_new_tensor_impl: not enough space in the scratch memory pool

max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错: ggml_new_tensor_impl: not enough space in the scratch memory pool