ERROR installing v0.3.16 with CUDA enabled on docker
# takes build time + 5-8 minutes to complete
FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV HF_TOKEN=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
ENV TZ=Asia/Hong_Kong
# install linux packages
RUN apt-get update && \
apt-get update && apt-get install -y sudo && \
apt-get update && apt-get install -y nano
# install python
RUN apt-get install -y python3-pip python3-dev
RUN apt-get install cmake -y
RUN apt-get install git -y
# install CUDA env
RUN apt-get install cuda-toolkit-12-4 -y
# RUN pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
RUN pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
RUN pip install torch-cluster -f https://data.pyg.org/whl/torch-2.5.1+cu124.html
# necessary to install llama-cpp-python
RUN apt-get update && \
apt-get install -y \
ninja-build
# install llama-cpp-python with CUDA enabled
ENV GGML_CUDA=1
ENV FORCE_CMAKE=1
ENV CMAKE_ARGS=-DGGML_CUDA=on
ENV CMAKE_ARGS=-DCMAKE_GENERATOR_TOOLSET="cuda=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4"
# dpkg -S libcuda.so.1
ENV LD_LIBRARY_PATH=/usr/local/cuda-12.4/compat/libcuda.so
RUN CMAKE_ARGS="-DGGML_CUDA=on" pip install --user llama-cpp-python==0.3.16 \
--extra-index-url https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.16-cu124/llama_cpp_python-0.3.16-cp310-cp310-linux_x86_64.whl \
--verbose
Hi, I am trying to install llama-cpp-python with GPU enabled. It worked for v0.2.77, but I need a more recent version. The issue is that by using v0.3.16 I had to use CMAKE_ARGS="-DGGML_CUDA=on" instead of using CMAKE_ARGS="-DLLAMA_CUBLAS=on" (that's the only change I was forced to do). The build gives me the following error: I searched, and one solution (https://github.com/abetlen/llama-cpp-python/issues/1617) was to add the LD_LIBRARY_PATH (tried with libcuda.so and libcuda.so.1), but still the same issue.
Also, is there an easier way (that perhaps I missed) to install v0.3.16? Thank you
ERROR:
/usr/bin/ld: warning: libcuda.so.1, needed by bin/libggml-cuda.so, not found (try using -rpath or -rpath-link)
ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH
Seems that this solve the problem, however, when checking if cuda is enabled, it return False: from llama_cpp.llama_cpp import load_shared_library import pathlib
lib = load_shared_library('llama',pathlib.Path('/root/.local/lib/python3.10/site-packages/llama_cpp/lib'))
bool(lib.llama_supports_gpu_offload())
>>> OUTPUT: False
Try this
# Any image is allowed, but this paticular will build significantly faster
# It is a complete copy of
#ARG FROM_IMAGE_NAME=nvidia/cuda:12.8.0-devel-ubuntu22.04
ARG FROM_IMAGE_NAME=pytorch/pytorch:2.8.0-cuda12.6-cudnn9-runtime
FROM ${FROM_IMAGE_NAME}
# Install build dependencies and llama-cpp-python with CUDA support
ENV DEBIAN_FRONTEND=noninteractive
# Install Python and pip
RUN apt-get update && apt-get install -y python3 python3-pip build-essential cmake ninja-build wget && \
apt-get clean && rm -rf /var/lib/apt/lists/*
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && dpkg -i cuda-keyring_1.1-1_all.deb
RUN apt-get update && apt-get -y install cuda-toolkit-12-6 && apt autoremove -y
# Set CUDA arch for A100 (8.0)
ENV TORCH_CUDA_ARCH_LIST="8.0"
# Expose API port
EXPOSE 8000
#RUN ["python3", "-m", "vllm.entrypoints.openai.api_server"]
ENV PYTHONPATH=/workspace
WORKDIR /workspace
ENV LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH
ENV PATH=/usr/local/cuda-12.6/bin:$PATH
ENV CUDA_HOME=/usr/local/cuda-12.6
# Install llama-cpp-python with CUDA support for GGUF models
RUN CUDACXX=/usr/local/cuda-12.6/bin/nvcc CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all-major" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade
#RUN CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --no-cache-dir
RUN pip cache purge
COPY . .
ENTRYPOINT ["python3", "test.py"]