lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

fix: fix package deprecation introduced by CUDA 13

Open windreamer opened this issue 3 months ago • 9 comments

Motivation

NVIDIA has deprecated versioned wheel package since CUDA 13, causing CUDA 13+ installations to fail with deprecated package names like nvidia-cublas-cu13 .

Modification

Remove the unconditional return to allow the version check to execute.

windreamer avatar Nov 11 '25 03:11 windreamer

In the runtime_cuda.txt file, the version of torch is restricted to torch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9. We need to upgrade and test it. cc @zhulinJulia24

lvhan028 avatar Nov 14 '25 05:11 lvhan028

Should we also upgrade triton?

lvhan028 avatar Nov 14 '25 06:11 lvhan028

In the runtime_cuda.txt file, the version of torch is restricted to torch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9. We need to upgrade and test it. cc @zhulinJulia24

Since flash-attention doesn't have a CUDA 13 build yet, we need to be more careful with the lmdeploy CUDA 13 release due to potential compatibility issues.

lvhan028 avatar Nov 14 '25 07:11 lvhan028

In the runtime_cuda.txt file, the version of torch is restricted to torch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9. We need to upgrade and test it. cc @zhulinJulia24

Since flash-attention doesn't have a CUDA 13 build yet, we need to be more careful with the lmdeploy CUDA 13 release due to potential compatibility issues.

I think currently we can make our code CUDA 13 ready but do not ship the CUDA 13 wheels and images until testing and relevant dependencies ready. Anyone wants to use LMDeploy in CUDA 13 can build from source by themselves.

windreamer avatar Nov 14 '25 07:11 windreamer

I've built the docker image by

docker build . -f docker/Dockerfile -t openmmlab/lmdeploy:test-cu13 --build-arg CUDA_VERSION=cu13

Then in the container, I tried serving a model using turbomind backend but got failure

>>> from lmdeploy import turbomind
/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/__init__.py", line 24, in <module>
    from .turbomind import TurboMind, update_parallel_config  # noqa: E402
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 35, in <module>
    import _turbomind as _tm  # noqa: E402
ImportError: libcublas.so.13: cannot open shared object file: No such file or directory

There is not "libcublas.so" in /usr/local/cuda

lvhan028 avatar Nov 14 '25 09:11 lvhan028

Pytorch engine doesn't work either

lvhan028 avatar Nov 14 '25 09:11 lvhan028

In the runtime_cuda.txt file, the version of torch is restricted to torch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9. We need to upgrade and test it. cc @zhulinJulia24

Since flash-attention doesn't have a CUDA 13 build yet, we need to be more careful with the lmdeploy CUDA 13 release due to potential compatibility issues.

I think currently we can make our code CUDA 13 ready but do not ship the CUDA 13 wheels and images until testing and relevant dependencies ready. Anyone wants to use LMDeploy in CUDA 13 can build from source by themselves.

But neither inference engine can work even though users can build lmdeploy from source in cu13 env.

lvhan028 avatar Nov 14 '25 09:11 lvhan028

I've built the docker image by

docker build . -f docker/Dockerfile -t openmmlab/lmdeploy:test-cu13 --build-arg CUDA_VERSION=cu13

Then in the container, I tried serving a model using turbomind backend but got failure

>>> from lmdeploy import turbomind
/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/__init__.py", line 24, in <module>
    from .turbomind import TurboMind, update_parallel_config  # noqa: E402
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 35, in <module>
    import _turbomind as _tm  # noqa: E402
ImportError: libcublas.so.13: cannot open shared object file: No such file or directory

There is not "libcublas.so" in /usr/local/cuda

After setting export LD_LIBRARY_PATH=/opt/py3/lib/python3.10/site-packages/nvidia/cu13/lib/:$LD_LIBRARY_PATH, turbomind engine works

lvhan028 avatar Nov 14 '25 09:11 lvhan028

Pytorch engine doesn't work either

After ugrading triton to its latest version, pytorch engine works too.

I agree we should defer the release until complete verification. In the meantime, I recommend updating the LD_LIBRARY_PATH configuration to this PR to ensure at least one engine is functional.

lvhan028 avatar Nov 14 '25 09:11 lvhan028