Not supported on old graphics cards?
Error when running on 3090 and 2080 Ti, and my version is
NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0
mindsearch-backend | Traceback (most recent call last):
mindsearch-backend | File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
mindsearch-backend | self.run()
mindsearch-backend | File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
mindsearch-backend | self._target(*self._args, **self._kwargs)
mindsearch-backend | File "/opt/lmdeploy/lmdeploy/serve/openai/api_server.py", line 1285, in serve
mindsearch-backend | VariableInterface.async_engine = pipeline_class(
mindsearch-backend | File "/opt/lmdeploy/lmdeploy/serve/async_engine.py", line 190, in __init__
mindsearch-backend | self._build_turbomind(model_path=model_path,
mindsearch-backend | File "/opt/lmdeploy/lmdeploy/serve/async_engine.py", line 235, in _build_turbomind
mindsearch-backend | self.engine = tm.TurboMind.from_pretrained(
mindsearch-backend | File "/opt/lmdeploy/lmdeploy/turbomind/turbomind.py", line 340, in from_pretrained
mindsearch-backend | return cls(model_path=pretrained_model_name_or_path,
mindsearch-backend | File "/opt/lmdeploy/lmdeploy/turbomind/turbomind.py", line 144, in __init__
mindsearch-backend | self.model_comm = self._from_hf(model_source=model_source,
mindsearch-backend | File "/opt/lmdeploy/lmdeploy/turbomind/turbomind.py", line 230, in _from_hf
mindsearch-backend | output_model_name, cfg = get_output_model_registered_name_and_config(
mindsearch-backend | File "/opt/lmdeploy/lmdeploy/turbomind/deploy/converter.py", line 123, in get_output_model_registered_name_and_config
mindsearch-backend | if not torch.cuda.is_bf16_supported():
mindsearch-backend | File "/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py", line 128, in is_bf16_supported
mindsearch-backend | device = torch.cuda.current_device()
mindsearch-backend | File "/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py", line 778, in current_device
mindsearch-backend | _lazy_init()
mindsearch-backend | File "/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
mindsearch-backend | torch._C._cuda_init()
mindsearch-backend | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
@lvhan028 Is there any solution?
The minimum supported cuda arch is sm70. Therefore, NVIDIA 2080Ti and 3090 are all supported. Are you trying to use tensor parallelism on two GPUs with different arch? If so, LMDeploy doesn't support such case.
The minimum supported cuda arch is sm70. Therefore, NVIDIA 2080Ti and 3090 are all supported. Are you trying to use tensor parallelism on two GPUs with different arch? If so, LMDeploy doesn't support such case.
Yes, by default, I attempt to start in a mix of the 30 series and 20 series. Could you provide guidance on how to modify the configuration file to select a specific graphics card?
CUDA_VISIBLE_DEVICES="1,2,3,4" your_program_command --tp 4
It selects GPUs 1, 2, 3, and 4. Make sure these GPUs have the same CUDA architecture.
CUDA_VISIBLE_DEVICES="1,2,3,4" your_program_command --tp 4It selects GPUs 1, 2, 3, and 4. Make sure these GPUs have the same CUDA architecture.
didn't work for me, I got the same error. My docker-compose yaml is like this,
services:
backend:
container_name: mindsearch-backend
build:
context: ..
dockerfile: docker/backend.dockerfile
image: mindsearch/backend:latest
restart: unless-stopped
# Uncomment the following line to force using local build
# pull: never
ports:
- "8002:8002"
environment:
- PYTHONUNBUFFERED=1
- OPENAI_API_KEY=${OPENAI_API_KEY:-}
- OPENAI_API_BASE=${OPENAI_API_BASE:-https://api.openai.com/v1}
- SILICON_API_KEY=${SILICON_API_KEY:-}
- HTTP_PROXY=http://100.78.60.40:7890
- HTTPS_PROXY=http://100.78.60.40:7890
- NO_PROXY=localhost,127.0.0.1
- CUDA_VISIBLE_DEVICES=0
command: python -m mindsearch.app --lang ${LANG:-cn} --model_format ${MODEL_FORMAT:-internlm_server}
volumes:
- /root/.cache:/root/.cache
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 4
capabilities: [gpu]
# GPU support explanation:
# The current configuration has been tested with NVIDIA GPUs. If you use other types of GPUs, you may need to adjust the configuration.
# For AMD GPUs, you can try using the ROCm driver by modifying the configuration as follows:
# deploy:
# resources:
# reservations:
# devices:
# - driver: amd
# count: 1
# capabilities: [gpu]
#
# For other GPU types, you may need to consult the respective Docker GPU support documentation.
# In theory, any GPU supported by PyTorch should be configurable here.
# If you encounter issues, try the following steps:
# 1. Ensure the correct GPU drivers are installed on the host
# 2. Check if your Docker version supports your GPU type
# 3. Install necessary GPU-related libraries in the Dockerfile
# 4. Adjust the deploy configuration here to match your GPU type
#
# Note: After changing GPU configuration, you may need to rebuild the image.
frontend:
container_name: mindsearch-frontend
build:
context: ..
dockerfile: docker/frontend.dockerfile
args:
- HTTP_PROXY=http://100.78.60.40:7890
- HTTPS_PROXY=http://100.78.60.40:7890
image: mindsearch/frontend:latest
restart: unless-stopped
# Uncomment the following line to force using local build
# pull: never
ports:
- "8081:8080"
depends_on:
- backend
I am not familiar with docker compose.
Can you try to lmdeploy serve api_server <the model path> --log-level INFO directly?
@jsrdcht
Problem Origin
Based on the docker-compose.yaml snippet you provided:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 4
capabilities: [gpu]
Your intention is to allocate 4 GPU cards to the container. However, this configuration alone is insufficient as it only tells Docker how many GPU resources are needed, without specifying which GPUs to use.
Solution
To resolve this issue, we need to use the CUDA_VISIBLE_DEVICES environment variable to control which GPU devices are visible to CUDA programs within the container.
Purpose of CUDA_VISIBLE_DEVICES
- Controls which GPU devices are visible to CUDA programs inside the container.
- Informs the Docker container which specific GPUs on the host it should use.
- Allows precise control over GPU resources used by each container, preventing issues caused by mixing GPUs with different architectures.
Configuration Steps
-
Add or modify the environment variable in the backend service section of your docker-compose.yml:
environment: - CUDA_VISIBLE_DEVICES=0,1,2,3 # Comma-separated # or - CUDA_VISIBLE_DEVICES=0-3 # Range notation -
Ensure that the number of GPUs specified in
CUDA_VISIBLE_DEVICESmatches thecountin thedeployconfiguration:deploy: resources: reservations: devices: - driver: nvidia count: 4 # Should match the number of GPUs in CUDA_VISIBLE_DEVICES capabilities: [gpu] -
If you have more GPUs (e.g., 8) and want to use specific ones (like the 5th to 8th, numbered 4-7), you can set it like this:
environment: - CUDA_VISIBLE_DEVICES=4,5,6,7 # Comma-separated # or - CUDA_VISIBLE_DEVICES=4-7 # Range notation -
After making changes, restart the container to apply the new configuration:
docker-compose down docker-compose up -d
Important Notes
-
Ensure that the selected GPUs have the same architecture to avoid compatibility issues.
-
If the problem persists, you can view the container logs for more information using:
docker-compose logs backend
If you have any further questions or need additional assistance, please don't hesitate to ask.