[Bug] 沐曦C500推理报错
Checklist
- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
docker run -it --name lmdeploy-server --net=host -v /app/models:/models crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest bash -i -c "lmdeploy serve api_server --backend pytorch --device maca /models" 上面的方法会报错: /opt/conda/lib/python3.10/site-packages/torch/cuda/init.py:130: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /workspace/framework/mcPytorch/c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 2025-10-20 12:34:09,560 - lmdeploy - WARNING - init.py:10 - Disable DLSlime Backend 2025-10-20 12:34:12,917 - lmdeploy - WARNING - init.py:10 - Disable DLSlime Backend 2025-10-20 12:34:13,915 - lmdeploy - ERROR - base.py:53 - RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu. 2025-10-20 12:34:13,915 - lmdeploy - ERROR - base.py:54 - <PyTorch> check failed! PyTorch is not available.
Reproduction
docker run -it --name lmdeploy-server --net=host -v /app/models:/models crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest bash -i -c "lmdeploy serve api_server --backend pytorch --device maca /models"
Environment
沐曦C500 ubuntu22.04
Error traceback
/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py:130: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /workspace/framework/mcPytorch/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
2025-10-20 12:34:09,560 - lmdeploy - WARNING - __init__.py:10 - Disable DLSlime Backend
2025-10-20 12:34:12,917 - lmdeploy - WARNING - __init__.py:10 - Disable DLSlime Backend
2025-10-20 12:34:13,915 - lmdeploy - ERROR - base.py:53 - RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.
2025-10-20 12:34:13,915 - lmdeploy - ERROR - base.py:54 - <PyTorch> check failed!
PyTorch is not available.
There is no available devices hanging in your container, you can use the command mx-smi to verify in your container. The is the command we use to create container for development, you can refer to it.
MACA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
CONTAINER_NAME=maca-test
MOUNT_DIR=/datapool
IMAGE=crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest
docker run -itd \
--ipc host \
--cap-add SYS_PTRACE \
--privileged=true \
--device=/dev/mem \
--device=/dev/dri \
--device=/dev/mxcd \
--device=/dev/infiniband \
--group-add video \
--network=host \
--shm-size '100gb' \
--ulimit memlock=-1 \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--name ${CONTAINER_NAME} \
-v ${MOUNT_DIR}:${MOUNT_DIR} \
--entrypoint /bin/bash \
${IMAGE}
You can start server with lmdeploy serve api_server --backend pytorch --device maca --cache-block-seq-len 16 /datapool/models/Qwen3-8B --model-name qwen, and verify the server with curl http://0.0.0.0:23333/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen","messages":[{"role":"user","content":"tell me a funny story"}]}'
There is no available devices hanging in your container, you can use the command
mx-smito verify in your container. The is the command we use to create container for development, you can refer to it.MACA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 CONTAINER_NAME=maca-test MOUNT_DIR=/datapool IMAGE=crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest docker run -itd \ --ipc host \ --cap-add SYS_PTRACE \ --privileged=true \ --device=/dev/mem \ --device=/dev/dri \ --device=/dev/mxcd \ --device=/dev/infiniband \ --group-add video \ --network=host \ --shm-size '100gb' \ --ulimit memlock=-1 \ --security-opt seccomp=unconfined \ --security-opt apparmor=unconfined \ --name ${CONTAINER_NAME} \ -v ${MOUNT_DIR}:${MOUNT_DIR} \ --entrypoint /bin/bash \ ${IMAGE}You can start server with
lmdeploy serve api_server --backend pytorch --device maca --cache-block-seq-len 16 /datapool/models/Qwen3-8B --model-name qwen, and verify the server withcurl http://0.0.0.0:23333/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen","messages":[{"role":"user","content":"tell me a funny story"}]}'
when I deploy internvl2.5-26B it failed. I run it with docker compose, command is: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True lmdeploy serve api_server --backend pytorch --device maca --cache-block-seq-len 16 /models/OpenGVLab/InternVL2_5-26B --model-name internvl2 --tp 2 --cache-max-entry-count 0.9
it report that:
lmdeploy - WARNING - ray_executor.py:87 - "expandable_segments:True" is not supported. Process mp_engine_proc: Traceback (most recent call last): File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/lmdeploy/lmdeploy/pytorch/engine/mp_engine/zmq_engine.py", line 103, in _mp_proc engine = Engine.from_pretrained( File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 440, in from_pretrained return cls(model_path=pretrained_model_name_or_path, File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 364, in init self.executor = build_executor(model_path, File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/init.py", line 120, in build_executor return RayExecutor( File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/ray_executor.py", line 303, in init ray.get([worker.warmup_dist.remote() for worker in self.workers]) File "/opt/conda/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper return fn(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 2771, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 919, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(DeferredCudaCallError): ray::RayWorkerWrapper.warmup_dist() (pid=784, ip=172.19.0.4, actor_id=5bcbd4eafc5309ab196da39401000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7f77a0d82650>) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 201, in _check_capability capability = get_device_capability(d) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 508, in get_device_capability prop = get_device_properties(device) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 528, in get_device_properties return _get_device_properties(device) # type: ignore[name-defined] RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/workspace/framework/mcPytorch/aten/src/ATen/cuda/CUDAContext.cpp":49, please report a bug to PyTorch. device=1, num_gpus=
The above exception was the direct cause of the following exception:
ray::RayWorkerWrapper.warmup_dist() (pid=784, ip=172.19.0.4, actor_id=5bcbd4eafc5309ab196da39401000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7f77a0d82650>) File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/ray_executor.py", line 205, in warmup_dist tmp = torch.empty((1, ), device='cuda') File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 337, in _lazy_init raise DeferredCudaCallError(msg) from e torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/workspace/framework/mcPytorch/aten/src/ATen/cuda/CUDAContext.cpp":49, please report a bug to PyTorch. device=1, num_gpus=
What should I do
There is no available devices hanging in your container, you can use the command
mx-smito verify in your container. The is the command we use to create container for development, you can refer to it.MACA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 CONTAINER_NAME=maca-test MOUNT_DIR=/datapool IMAGE=crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest docker run -itd \ --ipc host \ --cap-add SYS_PTRACE \ --privileged=true \ --device=/dev/mem \ --device=/dev/dri \ --device=/dev/mxcd \ --device=/dev/infiniband \ --group-add video \ --network=host \ --shm-size '100gb' \ --ulimit memlock=-1 \ --security-opt seccomp=unconfined \ --security-opt apparmor=unconfined \ --name ${CONTAINER_NAME} \ -v ${MOUNT_DIR}:${MOUNT_DIR} \ --entrypoint /bin/bash \ ${IMAGE}You can start server with
lmdeploy serve api_server --backend pytorch --device maca --cache-block-seq-len 16 /datapool/models/Qwen3-8B --model-name qwen, and verify the server withcurl http://0.0.0.0:23333/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen","messages":[{"role":"user","content":"tell me a funny story"}]}'when I deploy internvl2.5-26B it failed. I run it with docker compose, command is: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True lmdeploy serve api_server --backend pytorch --device maca --cache-block-seq-len 16 /models/OpenGVLab/InternVL2_5-26B --model-name internvl2 --tp 2 --cache-max-entry-count 0.9
it report that:
lmdeploy - WARNING - ray_executor.py:87 - "expandable_segments:True" is not supported. Process mp_engine_proc: Traceback (most recent call last): File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/lmdeploy/lmdeploy/pytorch/engine/mp_engine/zmq_engine.py", line 103, in _mp_proc engine = Engine.from_pretrained( File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 440, in from_pretrained return cls(model_path=pretrained_model_name_or_path, File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 364, in init self.executor = build_executor(model_path, File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/init.py", line 120, in build_executor return RayExecutor( File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/ray_executor.py", line 303, in init ray.get([worker.warmup_dist.remote() for worker in self.workers]) File "/opt/conda/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper return fn(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 2771, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 919, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(DeferredCudaCallError): ray::RayWorkerWrapper.warmup_dist() (pid=784, ip=172.19.0.4, actor_id=5bcbd4eafc5309ab196da39401000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7f77a0d82650>) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 201, in _check_capability capability = get_device_capability(d) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 508, in get_device_capability prop = get_device_properties(device) File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 528, in get_device_properties return _get_device_properties(device) # type: ignore[name-defined] RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/workspace/framework/mcPytorch/aten/src/ATen/cuda/CUDAContext.cpp":49, please report a bug to PyTorch. device=1, num_gpus=
The above exception was the direct cause of the following exception:
ray::RayWorkerWrapper.warmup_dist() (pid=784, ip=172.19.0.4, actor_id=5bcbd4eafc5309ab196da39401000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7f77a0d82650>) File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/opt/lmdeploy/lmdeploy/pytorch/engine/executor/ray_executor.py", line 205, in warmup_dist tmp = torch.empty((1, ), device='cuda') File "/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py", line 337, in _lazy_init raise DeferredCudaCallError(msg) from e torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/workspace/framework/mcPytorch/aten/src/ATen/cuda/CUDAContext.cpp":49, please report a bug to PyTorch. device=1, num_gpus=
What should I do
Could you show your message of commad mx-smi in your container ?