在沐曦C500上部署InternVL2.5-26B模型失败
参考 https://lmdeploy.readthedocs.io/en/latest/get_started/maca/get_started.html 下载提供的lmdeploy镜像,并采用docker compose 部署. 文件如下
version: "3.8"
x-common: &common pull_policy: always # always, never, missing, build restart: unless-stopped stop_signal: SIGINT stop_grace_period: 1m logging: driver: "json-file" options: max-file: "10" max-size: "100m"
services: lmdeploy-internvl25-26B: image: localhost:5000/lmdeploy:maca container_name: lmdeploy-internvl25-26B shm_size: 100gb environment: - CUDA_VISIBLE_DEVICES=4,5 - GLOO_SOCKET_IFNAME=lo devices: - "/dev/dri:/dev/dri" - "/dev/mxcd:/dev/mxcd" - "/dev/infiniband:/dev/infiniband" group_add: - "video" volumes: - /tmp:/tmp - /mnt/data0/models:/models/ ports: - 20003:23333 entrypoint: ["/bin/bash", "-c"] command: [ "lmdeploy serve api_server --backend pytorch --device cuda --cache-block-seq-len 16 /models/OpenGVLab/InternVL2_5-26B --model-name internvl2 --tp 2 --cache-max-entry-count 0.9 " ] <<: *common
但部署失败,日志显示如下: torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/workspace/framework/mcPytorch/aten/src/ATen/cuda/CUDAContext.cpp":49, please report a bug to PyTorch. device=1, num_gpus=
tp=1时可以部署,不过推理时显存不足