fix: fix package deprecation introduced by CUDA 13
Motivation
NVIDIA has deprecated versioned wheel package since CUDA 13, causing CUDA 13+ installations to fail with deprecated package names like nvidia-cublas-cu13 .
Modification
Remove the unconditional return to allow the version check to execute.
In the runtime_cuda.txt file, the version of torch is restricted to torch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9.
We need to upgrade and test it.
cc @zhulinJulia24
Should we also upgrade triton?
In the
runtime_cuda.txtfile, the version of torch is restricted totorch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9. We need to upgrade and test it. cc @zhulinJulia24
Since flash-attention doesn't have a CUDA 13 build yet, we need to be more careful with the lmdeploy CUDA 13 release due to potential compatibility issues.
In the
runtime_cuda.txtfile, the version of torch is restricted totorch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9. We need to upgrade and test it. cc @zhulinJulia24Since flash-attention doesn't have a CUDA 13 build yet, we need to be more careful with the lmdeploy CUDA 13 release due to potential compatibility issues.
I think currently we can make our code CUDA 13 ready but do not ship the CUDA 13 wheels and images until testing and relevant dependencies ready. Anyone wants to use LMDeploy in CUDA 13 can build from source by themselves.
I've built the docker image by
docker build . -f docker/Dockerfile -t openmmlab/lmdeploy:test-cu13 --build-arg CUDA_VERSION=cu13
Then in the container, I tried serving a model using turbomind backend but got failure
>>> from lmdeploy import turbomind
/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/__init__.py", line 24, in <module>
from .turbomind import TurboMind, update_parallel_config # noqa: E402
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 35, in <module>
import _turbomind as _tm # noqa: E402
ImportError: libcublas.so.13: cannot open shared object file: No such file or directory
There is not "libcublas.so" in /usr/local/cuda
Pytorch engine doesn't work either
In the
runtime_cuda.txtfile, the version of torch is restricted totorch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9. We need to upgrade and test it. cc @zhulinJulia24Since flash-attention doesn't have a CUDA 13 build yet, we need to be more careful with the lmdeploy CUDA 13 release due to potential compatibility issues.
I think currently we can make our code CUDA 13 ready but do not ship the CUDA 13 wheels and images until testing and relevant dependencies ready. Anyone wants to use LMDeploy in CUDA 13 can build from source by themselves.
But neither inference engine can work even though users can build lmdeploy from source in cu13 env.
I've built the docker image by
docker build . -f docker/Dockerfile -t openmmlab/lmdeploy:test-cu13 --build-arg CUDA_VERSION=cu13Then in the container, I tried serving a model using turbomind backend but got failure
>>> from lmdeploy import turbomind /opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you. import pynvml # type: ignore[import] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/__init__.py", line 24, in <module> from .turbomind import TurboMind, update_parallel_config # noqa: E402 File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 35, in <module> import _turbomind as _tm # noqa: E402 ImportError: libcublas.so.13: cannot open shared object file: No such file or directoryThere is not "libcublas.so" in
/usr/local/cuda
After setting export LD_LIBRARY_PATH=/opt/py3/lib/python3.10/site-packages/nvidia/cu13/lib/:$LD_LIBRARY_PATH, turbomind engine works
Pytorch engine doesn't work either
After ugrading triton to its latest version, pytorch engine works too.
I agree we should defer the release until complete verification. In the meantime, I recommend updating the LD_LIBRARY_PATH configuration to this PR to ensure at least one engine is functional.