lmdeploy [Bug] 沐曦C500推理报错

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

docker run -it --name lmdeploy-server --net=host -v /app/models:/models crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest bash -i -c "lmdeploy serve api_server --backend pytorch --device maca /models" 上面的方法会报错： /opt/conda/lib/python3.10/site-packages/torch/cuda/init.py:130: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /workspace/framework/mcPytorch/c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 2025-10-20 12:34:09,560 - lmdeploy - WARNING - init.py:10 - Disable DLSlime Backend 2025-10-20 12:34:12,917 - lmdeploy - WARNING - init.py:10 - Disable DLSlime Backend 2025-10-20 12:34:13,915 - lmdeploy - ERROR - base.py:53 - RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu. 2025-10-20 12:34:13,915 - lmdeploy - ERROR - base.py:54 - <PyTorch> check failed! PyTorch is not available.

Reproduction

docker run -it --name lmdeploy-server --net=host -v /app/models:/models crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest bash -i -c "lmdeploy serve api_server --backend pytorch --device maca /models"

Environment

沐曦C500   ubuntu22.04

Error traceback

/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py:130: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /workspace/framework/mcPytorch/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
2025-10-20 12:34:09,560 - lmdeploy - WARNING - __init__.py:10 - Disable DLSlime Backend
2025-10-20 12:34:12,917 - lmdeploy - WARNING - __init__.py:10 - Disable DLSlime Backend
2025-10-20 12:34:13,915 - lmdeploy - ERROR - base.py:53 - RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.
2025-10-20 12:34:13,915 - lmdeploy - ERROR - base.py:54 - <PyTorch> check failed!
PyTorch is not available.

Oct 20 '25 04:10 hraining1984-byte

There is no available devices hanging in your container, you can use the command mx-smi to verify in your container. The is the command we use to create container for development, you can refer to it.

MACA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
CONTAINER_NAME=maca-test
MOUNT_DIR=/datapool
IMAGE=crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest

docker run  -itd \
   --ipc host \
   --cap-add SYS_PTRACE \
   --privileged=true \
   --device=/dev/mem \
   --device=/dev/dri \
   --device=/dev/mxcd \
   --device=/dev/infiniband \
   --group-add video \
   --network=host \
   --shm-size '100gb' \
   --ulimit memlock=-1 \
   --security-opt seccomp=unconfined \
   --security-opt apparmor=unconfined \
   --name ${CONTAINER_NAME} \
   -v ${MOUNT_DIR}:${MOUNT_DIR} \
   --entrypoint /bin/bash \
   ${IMAGE}

You can start server with lmdeploy serve api_server --backend pytorch --device maca --cache-block-seq-len 16 /datapool/models/Qwen3-8B --model-name qwen, and verify the server with curl http://0.0.0.0:23333/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen","messages":[{"role":"user","content":"tell me a funny story"}]}'

Oct 24 '25 03:10 yao-fengchen

There is no available devices hanging in your container, you can use the command mx-smi to verify in your container. The is the command we use to create container for development, you can refer to it.
MACA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
CONTAINER_NAME=maca-test
MOUNT_DIR=/datapool
IMAGE=crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest

docker run  -itd \
   --ipc host \
   --cap-add SYS_PTRACE \
   --privileged=true \
   --device=/dev/mem \
   --device=/dev/dri \
   --device=/dev/mxcd \
   --device=/dev/infiniband \
   --group-add video \
   --network=host \
   --shm-size '100gb' \
   --ulimit memlock=-1 \
   --security-opt seccomp=unconfined \
   --security-opt apparmor=unconfined \
   --name ${CONTAINER_NAME} \
   -v ${MOUNT_DIR}:${MOUNT_DIR} \
   --entrypoint /bin/bash \
   ${IMAGE}
You can start server with lmdeploy serve api_server --backend pytorch --device maca --cache-block-seq-len 16 /datapool/models/Qwen3-8B --model-name qwen, and verify the server with curl http://0.0.0.0:23333/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen","messages":[{"role":"user","content":"tell me a funny story"}]}'

when I deploy internvl2.5-26B it failed. I run it with docker compose, command is: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True lmdeploy serve api_server --backend pytorch --device maca --cache-block-seq-len 16 /models/OpenGVLab/InternVL2_5-26B --model-name internvl2 --tp 2 --cache-max-entry-count 0.9

it report that:

lmdeploy - WARNING - ray_executor.py:87 - "expandable_segments:True" is not supported. Process mp_engine_proc: Traceback (most recent call last): File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/lmdeploy/lmdeploy/pytorch/engine/mp_engine/zmq_engine.py", line 103, in _mp_proc engine = Engine.from_pretrained( File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 440, in from_pretrained return cls(model_path=pretrained_model_name_or_path, File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 364, in init self.executor = build_executor(model_path, File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/init.py", line 120, in build_executor return RayExecutor( File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/ray_executor.py", line 303, in init ray.get([worker.warmup_dist.remote() for worker in self.workers]) File "/opt/conda/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper return fn(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 2771, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 919, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(DeferredCudaCallError): ray::RayWorkerWrapper.warmup_dist() (pid=784, ip=172.19.0.4, actor_id=5bcbd4eafc5309ab196da39401000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7f77a0d82650>) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 201, in _check_capability capability = get_device_capability(d) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 508, in get_device_capability prop = get_device_properties(device) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 528, in get_device_properties return _get_device_properties(device) # type: ignore[name-defined] RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/workspace/framework/mcPytorch/aten/src/ATen/cuda/CUDAContext.cpp":49, please report a bug to PyTorch. device=1, num_gpus=

The above exception was the direct cause of the following exception:

ray::RayWorkerWrapper.warmup_dist() (pid=784, ip=172.19.0.4, actor_id=5bcbd4eafc5309ab196da39401000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7f77a0d82650>) File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/ray_executor.py", line 205, in warmup_dist tmp = torch.empty((1, ), device='cuda') File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 337, in _lazy_init raise DeferredCudaCallError(msg) from e torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/workspace/framework/mcPytorch/aten/src/ATen/cuda/CUDAContext.cpp":49, please report a bug to PyTorch. device=1, num_gpus=

What should I do

Nov 17 '25 11:11 Itsanewday

There is no available devices hanging in your container, you can use the command mx-smi to verify in your container. The is the command we use to create container for development, you can refer to it.
MACA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
CONTAINER_NAME=maca-test
MOUNT_DIR=/datapool
IMAGE=crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest

docker run  -itd \
   --ipc host \
   --cap-add SYS_PTRACE \
   --privileged=true \
   --device=/dev/mem \
   --device=/dev/dri \
   --device=/dev/mxcd \
   --device=/dev/infiniband \
   --group-add video \
   --network=host \
   --shm-size '100gb' \
   --ulimit memlock=-1 \
   --security-opt seccomp=unconfined \
   --security-opt apparmor=unconfined \
   --name ${CONTAINER_NAME} \
   -v ${MOUNT_DIR}:${MOUNT_DIR} \
   --entrypoint /bin/bash \
   ${IMAGE}
You can start server with lmdeploy serve api_server --backend pytorch --device maca --cache-block-seq-len 16 /datapool/models/Qwen3-8B --model-name qwen, and verify the server with curl http://0.0.0.0:23333/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen","messages":[{"role":"user","content":"tell me a funny story"}]}'
when I deploy internvl2.5-26B it failed. I run it with docker compose, command is: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True lmdeploy serve api_server --backend pytorch --device maca --cache-block-seq-len 16 /models/OpenGVLab/InternVL2_5-26B --model-name internvl2 --tp 2 --cache-max-entry-count 0.9

it report that:

lmdeploy - WARNING - ray_executor.py:87 - "expandable_segments:True" is not supported. Process mp_engine_proc: Traceback (most recent call last): File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/lmdeploy/lmdeploy/pytorch/engine/mp_engine/zmq_engine.py", line 103, in _mp_proc engine = Engine.from_pretrained( File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 440, in from_pretrained return cls(model_path=pretrained_model_name_or_path, File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 364, in init self.executor = build_executor(model_path, File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/init.py", line 120, in build_executor return RayExecutor( File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/ray_executor.py", line 303, in init ray.get([worker.warmup_dist.remote() for worker in self.workers]) File "/opt/conda/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper return fn(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 2771, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 919, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(DeferredCudaCallError): ray::RayWorkerWrapper.warmup_dist() (pid=784, ip=172.19.0.4, actor_id=5bcbd4eafc5309ab196da39401000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7f77a0d82650>) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 201, in _check_capability capability = get_device_capability(d) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 508, in get_device_capability prop = get_device_properties(device) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 528, in get_device_properties return _get_device_properties(device) # type: ignore[name-defined] RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/workspace/framework/mcPytorch/aten/src/ATen/cuda/CUDAContext.cpp":49, please report a bug to PyTorch. device=1, num_gpus=

The above exception was the direct cause of the following exception:

ray::RayWorkerWrapper.warmup_dist() (pid=784, ip=172.19.0.4, actor_id=5bcbd4eafc5309ab196da39401000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7f77a0d82650>) File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/ray_executor.py", line 205, in warmup_dist tmp = torch.empty((1, ), device='cuda') File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 337, in _lazy_init raise DeferredCudaCallError(msg) from e torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/workspace/framework/mcPytorch/aten/src/ATen/cuda/CUDAContext.cpp":49, please report a bug to PyTorch. device=1, num_gpus=

What should I do

Could you show your message of commad mx-smi in your container ?

Nov 21 '25 07:11 yao-fengchen