lmdeploy [Bug] NPU是否支持glm4v-9b的部署推理

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

You are using a model of type chatglm to instantiate a model of type . This is not supported for all configurations of models and can yield errors. Could not locate the modeling_chatglm.py inside THUDM/glm-4v-9b. 2024-09-29 15:17:29,604 - lmdeploy - ERROR - builder.py:58 - matching vision model: GLM4VisionModel failed Traceback (most recent call last): File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/connection.py", line 199, in _new_conn sock = connection.create_connection( File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) TimeoutError: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen response = self._make_request( File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/connectionpool.py", line 490, in _make_request raise new_e File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request self._validate_conn(conn) File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn conn.connect() File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/connection.py", line 693, in connect self.sock = sock = self._new_conn() File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/connection.py", line 208, in _new_conn raise ConnectTimeoutError( urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0xffff80767940>, 'Connection to huggingface.co timed out. (connect timeout=10)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/python3.10.5/lib/python3.10/site-packages/requests/adapters.py", line 667, in send resp = conn.urlopen( File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen retries = retries.increment( File "/usr/local/python3.10.5/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/glm-4v-9b/resolve/main/modeling_chatglm.py (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0xffff80767940>, 'Connection to huggingface.co timed out. (connect timeout=10)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1746, in _get_metadata_or_catch_error metadata = get_hf_file_metadata( File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1666, in get_hf_file_metadata r = _request_wrapper( File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 364, in _request_wrapper response = _request_wrapper( File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 387, in _request_wrapper response = get_session().request(method=method, url=url, **params) File "/usr/local/python3.10.5/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "/usr/local/python3.10.5/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 93, in send return super().send(request, *args, **kwargs) File "/usr/local/python3.10.5/lib/python3.10/site-packages/requests/adapters.py", line 688, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/glm-4v-9b/resolve/main/modeling_chatglm.py (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0xffff80767940>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 555f4f55-f8f1-4342-8962-e32d4a5d2e10)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/hub.py", line 403, in cached_file resolved_file = hf_hub_download( File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f return f(*args, **kwargs) File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1232, in hf_hub_download return _hf_hub_download_to_cache_dir( File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1339, in _hf_hub_download_to_cache_dir _raise_on_head_call_error(head_call_error, force_download, local_files_only) File "/usr/local/python3.10.5/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1857, in _raise_on_head_call_error raise LocalEntryNotFoundError( huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/python3.10.5/bin/lmdeploy", line 33, in sys.exit(load_entry_point('lmdeploy', 'console_scripts', 'lmdeploy')()) File "/opt/lmdeploy/lmdeploy/cli/entrypoint.py", line 42, in run args.run(args) File "/opt/lmdeploy/lmdeploy/cli/serve.py", line 329, in api_server run_api_server(args.model_path, File "/opt/lmdeploy/lmdeploy/serve/openai/api_server.py", line 1045, in serve VariableInterface.async_engine = pipeline_class( File "/opt/lmdeploy/lmdeploy/serve/vl_async_engine.py", line 24, in init self.vl_encoder = ImageEncoder(model_path, File "/opt/lmdeploy/lmdeploy/vl/engine.py", line 90, in init self.model = load_vl_model(model_path, backend_config=backend_config) File "/opt/lmdeploy/lmdeploy/vl/model/builder.py", line 56, in load_vl_model return module(**kwargs) File "/opt/lmdeploy/lmdeploy/vl/model/base.py", line 31, in init self.build_model() File "/opt/lmdeploy/lmdeploy/vl/model/glm_4v.py", line 36, in build_model model = AutoModelForCausalLM.from_config(self.hf_config, File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 433, in from_config model_class = get_class_from_dynamic_module(class_ref, repo_id, **kwargs) File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 540, in get_class_from_dynamic_module final_module = get_cached_module_file( File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 344, in get_cached_module_file resolved_module_file = cached_file( File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/hub.py", line 446, in cached_file raise EnvironmentError( OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like THUDM/glm-4v-9b is not the path to a directory containing a file named modeling_chatglm.py. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'. [ERROR] 2024-09-29-15:17:30 (PID:8353, Device:0, RankID:-1) ERR99999 UNKNOWN application exception :914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()

说缺少文件，尝试从huggingface拉取，但是检查后没有缺少文件，需要进行适配Ascend-compatible PyTorch 模型吗？

Reproduction

ASCEND_RT_VISIBLE_DEVICES=6 lmdeploy serve api_server
/home/ZhipuAI/glm-4v-9b
--backend pytorch
--chat-template /opt/lmdeploy/chat_template/glm-4v.json
--model-name glm-4v
--device ascend
--server-name 0.0.0.0
--server-port 50077

Environment

[W compiler_depend.ts:623] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
sys.platform: linux
Python: 3.10.5 (main, Sep 24 2024, 03:43:49) [GCC 9.4.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.1.0
PyTorch compiling details: PyTorch built with:
  - GCC 10.2
  - C++ Version: 201703
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-10/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=open, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.16.0
LMDeploy: 0.6.0+bf89a01
transformers: 4.46.0.dev0
gradio: Not Found
fastapi: 0.115.0
pydantic: 2.9.2
triton: Not Found

Error traceback

No response

Sep 29 '24 07:09 Sunxiaohu0406

虽然我们没有验证过npu上的glm4v-9b的推理。但是你这个像是路径指错了？可以发一下你的使用方式

Sep 30 '24 08:09 jinminxi104

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

Oct 15 '24 02:10 github-actions[bot]

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

Oct 20 '24 02:10 github-actions[bot]

We support glm4v-9b in version 0.6.3 of lmdeploy and version 0.1.2 of dlinfer(https://github.com/DeepLink-org/dlinfer), you can try again.

Nov 18 '24 03:11 yao-fengchen