qwen.cpp icon indicating copy to clipboard operation
qwen.cpp copied to clipboard

72B模型量化需要多大内存,192G的内存都会被kill掉

Open sweetcard opened this issue 2 years ago • 9 comments

请问有大佬知道吗?

sweetcard avatar Dec 01 '23 03:12 sweetcard

also want to know

syslot avatar Dec 04 '23 06:12 syslot

估计要超过200G的内存才行。llama.cpp已经支持了,可以通过temp文件的方式,而不是用内存的方式进行量化了。

sweetcard avatar Dec 04 '23 14:12 sweetcard

我们什么时候可以用上呢?

bigbigtooth avatar Dec 07 '23 04:12 bigbigtooth

我们什么时候可以用上呢?

去下载自己量化就可以了

sweetcard avatar Dec 07 '23 04:12 sweetcard

去下载自己量化就可以了

哦?这个是可以兼容的吗?我马上试试

bigbigtooth avatar Dec 07 '23 12:12 bigbigtooth

去下载自己量化就可以了

哦?这个是可以兼容的吗?我马上试试

直接用llama.cpp量化就OK了

sweetcard avatar Dec 07 '23 14:12 sweetcard

去下载自己量化就可以了

哦?这个是可以兼容的吗?我马上试试

直接用llama.cpp量化就OK了

请教:用llama.cpp量化时报错:

Traceback (most recent call last): File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1228, in main() File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1161, in main model_plus = load_some_model(args.model) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1078, in load_some_model model_plus = merge_multifile_models(models_plus) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 593, in merge_multifile_models model = merge_sharded([mp.model for mp in models_plus]) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 572, in merge_sharded return {name: convert(name) for name in names} File "/Users/xxx/AIproject/llama.cpp/convert.py", line 572, in return {name: convert(name) for name in names} File "/Users/xxx/AIproject/llama.cpp/convert.py", line 547, in convert lazy_tensors: list[LazyTensor] = [model[name] for model in models] File "/Users/xxx/AIproject/llama.cpp/convert.py", line 547, in lazy_tensors: list[LazyTensor] = [model[name] for model in models] KeyError: 'transformer.h.0.attn.c_attn.bias'

pip list :

Package Version


Brotli 1.1.0 certifi 2023.11.17 charset-normalizer 3.3.2 contourpy 1.2.0 cycler 0.12.1 einops 0.7.0 filelock 3.13.1 fonttools 4.46.0 fsspec 2023.10.0 gguf 0.5.1 gmpy2 2.1.2 huggingface-hub 0.19.4 idna 3.6 Jinja2 3.1.2 kiwisolver 1.4.5 MarkupSafe 2.1.3 matplotlib 3.8.2 mpmath 1.3.0 networkx 3.2.1 numpy 1.24.4 packaging 23.2 Pillow 10.1.0 pip 23.3.1 pyparsing 3.1.1 PySocks 1.7.1 python-dateutil 2.8.2 PyYAML 6.0.1 qwen-cpp 0.1.2 regex 2023.10.3 requests 2.31.0 safetensors 0.4.1 sentencepiece 0.1.98 setuptools 68.2.2 six 1.16.0 sympy 1.12 tabulate 0.9.0 tiktoken 0.5.1 tokenizers 0.15.0 torch 2.2.0.dev20231130 torchaudio 2.2.0.dev20231130 torchvision 0.17.0.dev20231130 tqdm 4.66.1 transformers 4.35.2 transformers-stream-generator 0.0.4 typing_extensions 4.8.0 urllib3 2.1.0 wheel 0.42.0

是我装错了什么库吗???没搞懂???

bigbigtooth avatar Dec 08 '23 06:12 bigbigtooth

去下载自己量化就可以了

哦?这个是可以兼容的吗?我马上试试

直接用llama.cpp量化就OK了

请教:用llama.cpp量化时报错:

Traceback (most recent call last): File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1228, in main() File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1161, in main model_plus = load_some_model(args.model) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1078, in load_some_model model_plus = merge_multifile_models(models_plus) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 593, in merge_multifile_models model = merge_sharded([mp.model for mp in models_plus]) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 572, in merge_sharded return {name: convert(name) for name in names} File "/Users/xxx/AIproject/llama.cpp/convert.py", line 572, in return {name: convert(name) for name in names} File "/Users/xxx/AIproject/llama.cpp/convert.py", line 547, in convert lazy_tensors: list[LazyTensor] = [model[name] for model in models] File "/Users/xxx/AIproject/llama.cpp/convert.py", line 547, in lazy_tensors: list[LazyTensor] = [model[name] for model in models] KeyError: 'transformer.h.0.attn.c_attn.bias'

pip list :

Package Version

Brotli 1.1.0 certifi 2023.11.17 charset-normalizer 3.3.2 contourpy 1.2.0 cycler 0.12.1 einops 0.7.0 filelock 3.13.1 fonttools 4.46.0 fsspec 2023.10.0 gguf 0.5.1 gmpy2 2.1.2 huggingface-hub 0.19.4 idna 3.6 Jinja2 3.1.2 kiwisolver 1.4.5 MarkupSafe 2.1.3 matplotlib 3.8.2 mpmath 1.3.0 networkx 3.2.1 numpy 1.24.4 packaging 23.2 Pillow 10.1.0 pip 23.3.1 pyparsing 3.1.1 PySocks 1.7.1 python-dateutil 2.8.2 PyYAML 6.0.1 qwen-cpp 0.1.2 regex 2023.10.3 requests 2.31.0 safetensors 0.4.1 sentencepiece 0.1.98 setuptools 68.2.2 six 1.16.0 sympy 1.12 tabulate 0.9.0 tiktoken 0.5.1 tokenizers 0.15.0 torch 2.2.0.dev20231130 torchaudio 2.2.0.dev20231130 torchvision 0.17.0.dev20231130 tqdm 4.66.1 transformers 4.35.2 transformers-stream-generator 0.0.4 typing_extensions 4.8.0 urllib3 2.1.0 wheel 0.42.0

是我装错了什么库吗???没搞懂???

要用这个脚本: convert-hf-to-gguf.py

sweetcard avatar Dec 08 '23 06:12 sweetcard

哈哈哈,果然可行,感谢。

72b的推理速度真的慢

bigbigtooth avatar Dec 08 '23 11:12 bigbigtooth