MindSearch Not supported on old graphics cards?

Error when running on 3090 and 2080 Ti, and my version is

NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0

mindsearch-backend   | Traceback (most recent call last):
mindsearch-backend   |   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
mindsearch-backend   |     self.run()
mindsearch-backend   |   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
mindsearch-backend   |     self._target(*self._args, **self._kwargs)
mindsearch-backend   |   File "/opt/lmdeploy/lmdeploy/serve/openai/api_server.py", line 1285, in serve
mindsearch-backend   |     VariableInterface.async_engine = pipeline_class(
mindsearch-backend   |   File "/opt/lmdeploy/lmdeploy/serve/async_engine.py", line 190, in __init__
mindsearch-backend   |     self._build_turbomind(model_path=model_path,
mindsearch-backend   |   File "/opt/lmdeploy/lmdeploy/serve/async_engine.py", line 235, in _build_turbomind
mindsearch-backend   |     self.engine = tm.TurboMind.from_pretrained(
mindsearch-backend   |   File "/opt/lmdeploy/lmdeploy/turbomind/turbomind.py", line 340, in from_pretrained
mindsearch-backend   |     return cls(model_path=pretrained_model_name_or_path,
mindsearch-backend   |   File "/opt/lmdeploy/lmdeploy/turbomind/turbomind.py", line 144, in __init__
mindsearch-backend   |     self.model_comm = self._from_hf(model_source=model_source,
mindsearch-backend   |   File "/opt/lmdeploy/lmdeploy/turbomind/turbomind.py", line 230, in _from_hf
mindsearch-backend   |     output_model_name, cfg = get_output_model_registered_name_and_config(
mindsearch-backend   |   File "/opt/lmdeploy/lmdeploy/turbomind/deploy/converter.py", line 123, in get_output_model_registered_name_and_config
mindsearch-backend   |     if not torch.cuda.is_bf16_supported():
mindsearch-backend   |   File "/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py", line 128, in is_bf16_supported
mindsearch-backend   |     device = torch.cuda.current_device()
mindsearch-backend   |   File "/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py", line 778, in current_device
mindsearch-backend   |     _lazy_init()
mindsearch-backend   |   File "/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
mindsearch-backend   |     torch._C._cuda_init()
mindsearch-backend   | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

Aug 27 '24 18:08 jsrdcht

@lvhan028 Is there any solution?

Aug 28 '24 08:08 Harold-lkk

The minimum supported cuda arch is sm70. Therefore, NVIDIA 2080Ti and 3090 are all supported. Are you trying to use tensor parallelism on two GPUs with different arch? If so, LMDeploy doesn't support such case.

Aug 28 '24 09:08 lvhan028

The minimum supported cuda arch is sm70. Therefore, NVIDIA 2080Ti and 3090 are all supported. Are you trying to use tensor parallelism on two GPUs with different arch? If so, LMDeploy doesn't support such case.

Yes, by default, I attempt to start in a mix of the 30 series and 20 series. Could you provide guidance on how to modify the configuration file to select a specific graphics card?

Aug 28 '24 09:08 jsrdcht

CUDA_VISIBLE_DEVICES="1,2,3,4" your_program_command --tp 4

It selects GPUs 1, 2, 3, and 4. Make sure these GPUs have the same CUDA architecture.

Aug 28 '24 10:08 lvhan028

CUDA_VISIBLE_DEVICES="1,2,3,4" your_program_command --tp 4
It selects GPUs 1, 2, 3, and 4. Make sure these GPUs have the same CUDA architecture.

didn't work for me, I got the same error. My docker-compose yaml is like this,

services:
  backend:
    container_name: mindsearch-backend
    build:
      context: ..
      dockerfile: docker/backend.dockerfile
    image: mindsearch/backend:latest
    restart: unless-stopped
    # Uncomment the following line to force using local build
    # pull: never
    ports:
      - "8002:8002"
    environment:
      - PYTHONUNBUFFERED=1
      - OPENAI_API_KEY=${OPENAI_API_KEY:-}
      - OPENAI_API_BASE=${OPENAI_API_BASE:-https://api.openai.com/v1}
      - SILICON_API_KEY=${SILICON_API_KEY:-}
      - HTTP_PROXY=http://100.78.60.40:7890
      - HTTPS_PROXY=http://100.78.60.40:7890
      - NO_PROXY=localhost,127.0.0.1
      - CUDA_VISIBLE_DEVICES=0
    command: python -m mindsearch.app --lang ${LANG:-cn} --model_format ${MODEL_FORMAT:-internlm_server}
    volumes:
      - /root/.cache:/root/.cache
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 4
              capabilities: [gpu]
    # GPU support explanation:
    # The current configuration has been tested with NVIDIA GPUs. If you use other types of GPUs, you may need to adjust the configuration.
    # For AMD GPUs, you can try using the ROCm driver by modifying the configuration as follows:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: amd
    #           count: 1
    #           capabilities: [gpu]
    #
    # For other GPU types, you may need to consult the respective Docker GPU support documentation.
    # In theory, any GPU supported by PyTorch should be configurable here.
    # If you encounter issues, try the following steps:
    # 1. Ensure the correct GPU drivers are installed on the host
    # 2. Check if your Docker version supports your GPU type
    # 3. Install necessary GPU-related libraries in the Dockerfile
    # 4. Adjust the deploy configuration here to match your GPU type
    #
    # Note: After changing GPU configuration, you may need to rebuild the image.

  frontend:
    container_name: mindsearch-frontend
    build:
      context: ..
      dockerfile: docker/frontend.dockerfile
      args:
        - HTTP_PROXY=http://100.78.60.40:7890
        - HTTPS_PROXY=http://100.78.60.40:7890
    image: mindsearch/frontend:latest
    restart: unless-stopped
    # Uncomment the following line to force using local build
    # pull: never
    ports:
      - "8081:8080"
    depends_on:
      - backend

Aug 28 '24 18:08 jsrdcht

I am not familiar with docker compose. Can you try to lmdeploy serve api_server <the model path> --log-level INFO directly?

Aug 29 '24 03:08 lvhan028

@jsrdcht

Problem Origin

Based on the docker-compose.yaml snippet you provided:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 4
          capabilities: [gpu]

Your intention is to allocate 4 GPU cards to the container. However, this configuration alone is insufficient as it only tells Docker how many GPU resources are needed, without specifying which GPUs to use.

Solution

To resolve this issue, we need to use the CUDA_VISIBLE_DEVICES environment variable to control which GPU devices are visible to CUDA programs within the container.

Purpose of CUDA_VISIBLE_DEVICES

Controls which GPU devices are visible to CUDA programs inside the container.
Informs the Docker container which specific GPUs on the host it should use.
Allows precise control over GPU resources used by each container, preventing issues caused by mixing GPUs with different architectures.

Configuration Steps

Add or modify the environment variable in the backend service section of your docker-compose.yml:

environment:
  - CUDA_VISIBLE_DEVICES=0,1,2,3  # Comma-separated
  # or
  - CUDA_VISIBLE_DEVICES=0-3      # Range notation

Ensure that the number of GPUs specified in CUDA_VISIBLE_DEVICES matches the count in the deploy configuration:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 4  # Should match the number of GPUs in CUDA_VISIBLE_DEVICES
          capabilities: [gpu]

If you have more GPUs (e.g., 8) and want to use specific ones (like the 5th to 8th, numbered 4-7), you can set it like this:

environment:
  - CUDA_VISIBLE_DEVICES=4,5,6,7  # Comma-separated
  # or
  - CUDA_VISIBLE_DEVICES=4-7      # Range notation

After making changes, restart the container to apply the new configuration:
```
docker-compose down
docker-compose up -d
```

Important Notes

Ensure that the selected GPUs have the same architecture to avoid compatibility issues.
If the problem persists, you can view the container logs for more information using:
```
docker-compose logs backend
```

If you have any further questions or need additional assistance, please don't hesitate to ask.

Aug 30 '24 03:08 lcolok