Traceback (most recent call last):
File "/usr/local/bin/dashinfer_vlm_serve", line 33, in
sys.exit(load_entry_point('dashinfer-vlm', 'console_scripts', 'dashinfer_vlm_serve')())
File "/root/code/dashinfer_vlm/api_server/server.py", line 683, in main
init()
File "/root/code/dashinfer_vlm/api_server/server.py", line 93, in init
model_loader.load_model(direct_load=False, load_format="auto")
File "/root/code/dashinfer_vlm/vl_inference/utils/model_loader.py", line 134, in serialize
return super().serialize(
File "/usr/local/lib/python3.10/dist-packages/dashinfer/allspark/model_loader.py", line 730, in serialize
return self.serialize_to_path(engine, model_output_dir=model_output_dir, enable_quant=enable_quant,
File "/usr/local/lib/python3.10/dist-packages/dashinfer/allspark/model_loader.py", line 839, in serialize_to_path
engine.serialize_model_from_torch(model_name=safe_model_name,
File "/usr/local/lib/python3.10/dist-packages/dashinfer/allspark/engine.py", line 151, in serialize_model_from_torch
return self.engine.serialize_model_from_torch(
File "/usr/local/lib/python3.10/dist-packages/dashinfer/allspark/engine_utils.py", line 109, in serialize_model_from_torch
model_proto = self.model_map[model_type](
File "/usr/local/lib/python3.10/dist-packages/dashinfer/allspark/model/qwen_v20.py", line 22, in init
self._build_graph(self.model_config, derive_type)
File "/usr/local/lib/python3.10/dist-packages/dashinfer/allspark/model/qwen_v15.py", line 113, in _build_graph
i, key)] = [(self.name_adapter.fullname(v).format(i))
File "/usr/local/lib/python3.10/dist-packages/dashinfer/allspark/model/qwen_v15.py", line 113, in
i, key)] = [(self.name_adapter.fullname(v).format(i))
File "/usr/local/lib/python3.10/dist-packages/dashinfer/allspark/model/utils.py", line 60, in fullname
return self.weight_name_segments[std_name][
KeyError: 'q_proj.weight'
dashinfer_vlm_serve 现在还不支持gptq量化后的qwen2-vl-2b模型吗?
试了awq量化后的、gptq量化后的都报这个错,请问下是现在还不支持吗?还是使用上有问题啊?文档也没看到相关说明
https://github.com/modelscope/dash-infer/pull/60
#60
看着最新版本支持了gptq量化的,awq的支持了吗
试了awq量化后的、gptq量化后的都报这个错,请问下是现在还不支持吗?还是使用上有问题啊?文档也没看到相关说明
@x574chen dashinfer_vlm_serve 中支持了 gptq 的量化参数传递了么,
看文档:
https://dashinfer.readthedocs.io/en/latest/quant/weight_activate_quant.html
是这样的:
A8W8 GPTQ:
model_loader.load_model().serialize(
engine, model_output_dir=tmp_dir,
enable_quant=True).free_model()
@kzjeef GPTQ, a8w8, a16w4, fp8都支持了
https://github.com/modelscope/dash-infer/blob/main/multimodal/dashinfer_vlm/vl_inference/utils/model_loader.py#L145
dashinfer_vlm_serve 启动加上对应的--quant-type
嗷嗷,gptq的升级新版本OK了。请问awq有计划支持吗