MiniCPM-V [BUG] 使用MiniCPM-o 2.6 int4模型时，chat.py报错

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

我下载了MiniCPM-o 2.6 int4模型，并把路径写入到chat.py入口函数的model_path变量，运行代码得到了如下报错：

(MiniCPMo) hygx@hygx:~/code/MiniCPM-o$  cd /home/hygx/code/MiniCPM-o ; /usr/bin/env /home/hygx/anaconda3/envs/MiniCPMo/bin/python /home/hygx/.vscode-server/extensions/ms-python.debugpy-2024.14.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 54997 -- /home/hygx/code/MiniCPM-o/chat.py 
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Traceback (most recent call last):
  File "/home/hygx/anaconda3/envs/MiniCPMo/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/hygx/anaconda3/envs/MiniCPMo/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/hygx/.vscode-server/extensions/ms-python.debugpy-2024.14.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71, in <module>
    cli.main()
  File "/home/hygx/.vscode-server/extensions/ms-python.debugpy-2024.14.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
    run()
  File "/home/hygx/.vscode-server/extensions/ms-python.debugpy-2024.14.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/hygx/.vscode-server/extensions/ms-python.debugpy-2024.14.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
    return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
  File "/home/hygx/.vscode-server/extensions/ms-python.debugpy-2024.14.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
    _run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
  File "/home/hygx/.vscode-server/extensions/ms-python.debugpy-2024.14.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
    exec(code, run_globals)
  File "/home/hygx/code/MiniCPM-o/chat.py", line 282, in <module>
    chat_model = MiniCPMVChat(model_path)
  File "/home/hygx/code/MiniCPM-o/chat.py", line 269, in __init__
    self.model = MiniCPMV(model_path)
  File "/home/hygx/code/MiniCPM-o/chat.py", line 142, in __init__
    self.model = AutoModel.from_pretrained(model_path, trust_remote_code=True).to(dtype=torch.bfloat16)
  File "/home/hygx/anaconda3/envs/MiniCPMo/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
    return model_class.from_pretrained(
  File "/home/hygx/anaconda3/envs/MiniCPMo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3738, in from_pretrained
    if metadata.get("format") == "pt":
AttributeError: 'NoneType' object has no attribute 'get'

是我调用方式有问题吗？还是别的什么原因？如何解决这个问题？

期望行为 | Expected Behavior

能和模型正常对话

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS: Windows 11 with WSL2
- Python:3.10
- Transformers:4.44.2
- PyTorch:2.2.0
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1

备注 | Anything else?

No response

Jan 16 '25 03:01 JV-X

self.model = AutoModel.from_pretrained(model_path, trust_remote_code=True).to(dtype=torch.bfloat16) int4 model should not be bf16 type?

Jan 16 '25 08:01 gryffindor-rr

同样的问题

Jan 16 '25 11:01 Anionex

同样的问题

应该是int4版本的model.safetensors文件中缺少metadata引起的

Jan 17 '25 03:01 Anionex

你好，可以根据这个 README 步骤安装 AutoGPTQ 使用 int4 量化推理 You can follow the steps in this README to install AutoGPTQ and perform int4 quantized inference.

Jan 17 '25 07:01 YuzaChongyi

int4 model doesn't contain metadata

Jan 17 '25 08:01 sudowind

你好，可以根据这个 README 步骤安装 AutoGPTQ 使用 int4 量化推理 You can follow the steps in this README to install AutoGPTQ and perform int4 quantized inference.

It can NOT work, after installation of AutoGPTQ. The issue duplicated, as below.

Installed /media/harr/King/MiniCPM-o/AutoGPTQ

Successfully installed auto_gptq Remote version of pip: 24.3.1 Local version of pip: 24.3.1 Was pip installed by pip? True Removed build tracker: '/tmp/pip-build-tracker-3225ifjw' (minicpmo) harr@harr-Kuangshi16-Super-Series-GM6IX8X:/media/harr/King/MiniCPM-o$ python web_demos/minicpm-o_2.6/model_server.py --model openbmb/MiniCPM-o-2_6-int4 WARNING - AutoGPTQ has stopped development. Please transition to GPTQModel: https://github.com/ModelCoud/GPTQModel GPTQModel has been merged into Transformers/Optimum and full deprecation of AutoGPTQ within HF frameworks is planned in the near-future. low_cpu_mem_usage was None, now set to True since model is quantized. Traceback (most recent call last): File "/media/harr/King/MiniCPM-o/web_demos/minicpm-o_2.6/model_server.py", line 601, in stream_manager = StreamManager() File "/media/harr/King/MiniCPM-o/web_demos/minicpm-o_2.6/model_server.py", line 96, in init self.minicpmo_model = AutoModel.from_pretrained(self.minicpmo_model_path, trust_remote_code=True, torch_dtype=self.target_dtype, attn_implementation='sdpa') File "/home/harr/anaconda3/envs/minicpmo/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained return model_class.from_pretrained( File "/home/harr/anaconda3/envs/minicpmo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3738, in from_pretrained if metadata.get("format") == "pt": AttributeError: 'NoneType' object has no attribute 'get' (minicpmo) harr@harr-Kuangshi16-Super-Series-GM6IX8X:/media/harr/King/MiniCPM-o$

Jan 17 '25 09:01 HarryBXie

If you want to using int4 version for webdemo, you should change the model initialization by AutoGPTQForCausalLM.from_quantized

model = AutoGPTQForCausalLM.from_quantized(
    'openbmb/MiniCPM-o-2_6-int4',
    torch_dtype=torch.bfloat16,
    device="cuda:0",
    trust_remote_code=True,
    disable_exllama=True,
    disable_exllamav2=True
)

Jan 17 '25 09:01 YuzaChongyi