llama-cpp-python ERROR installing v0.3.16 with CUDA enabled on docker

# takes build time + 5-8 minutes to complete
FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive
ENV HF_TOKEN=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
ENV TZ=Asia/Hong_Kong

# install linux packages
RUN apt-get update && \
    apt-get update && apt-get install -y sudo && \
    apt-get update && apt-get install -y nano

# install python
RUN apt-get install -y python3-pip python3-dev
RUN apt-get install cmake -y
RUN apt-get install git -y

# install CUDA env
RUN apt-get install cuda-toolkit-12-4 -y 
# RUN pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
RUN pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
RUN pip install torch-cluster -f https://data.pyg.org/whl/torch-2.5.1+cu124.html

# necessary to install llama-cpp-python
RUN apt-get update && \
    apt-get install -y \
    ninja-build

# install llama-cpp-python with CUDA enabled
ENV GGML_CUDA=1
ENV FORCE_CMAKE=1
ENV CMAKE_ARGS=-DGGML_CUDA=on

ENV CMAKE_ARGS=-DCMAKE_GENERATOR_TOOLSET="cuda=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4"
#  dpkg -S libcuda.so.1 
ENV LD_LIBRARY_PATH=/usr/local/cuda-12.4/compat/libcuda.so

RUN CMAKE_ARGS="-DGGML_CUDA=on" pip install --user llama-cpp-python==0.3.16 \
--extra-index-url https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.16-cu124/llama_cpp_python-0.3.16-cp310-cp310-linux_x86_64.whl \
--verbose

Hi, I am trying to install llama-cpp-python with GPU enabled. It worked for v0.2.77, but I need a more recent version. The issue is that by using v0.3.16 I had to use CMAKE_ARGS="-DGGML_CUDA=on" instead of using CMAKE_ARGS="-DLLAMA_CUBLAS=on" (that's the only change I was forced to do). The build gives me the following error: I searched, and one solution (https://github.com/abetlen/llama-cpp-python/issues/1617) was to add the LD_LIBRARY_PATH (tried with libcuda.so and libcuda.so.1), but still the same issue.

Also, is there an easier way (that perhaps I missed) to install v0.3.16? Thank you

ERROR:
/usr/bin/ld: warning: libcuda.so.1, needed by bin/libggml-cuda.so, not found (try using -rpath or -rpath-link)

Aug 29 '25 23:08 arditobryan

ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH

Seems that this solve the problem, however, when checking if cuda is enabled, it return False: from llama_cpp.llama_cpp import load_shared_library import pathlib

lib = load_shared_library('llama',pathlib.Path('/root/.local/lib/python3.10/site-packages/llama_cpp/lib'))
bool(lib.llama_supports_gpu_offload())
>>> OUTPUT: False

Aug 30 '25 00:08 arditobryan

Try this

# Any image is allowed, but this paticular will build significantly faster
# It is a complete copy of 
#ARG FROM_IMAGE_NAME=nvidia/cuda:12.8.0-devel-ubuntu22.04
ARG FROM_IMAGE_NAME=pytorch/pytorch:2.8.0-cuda12.6-cudnn9-runtime
FROM ${FROM_IMAGE_NAME}

# Install build dependencies and llama-cpp-python with CUDA support
ENV DEBIAN_FRONTEND=noninteractive

# Install Python and pip
RUN apt-get update && apt-get install -y python3 python3-pip build-essential cmake ninja-build wget && \
apt-get clean && rm -rf /var/lib/apt/lists/*

RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && dpkg -i cuda-keyring_1.1-1_all.deb

RUN apt-get update && apt-get -y install cuda-toolkit-12-6 && apt autoremove -y
# Set CUDA arch for A100 (8.0)
ENV TORCH_CUDA_ARCH_LIST="8.0"

# Expose API port
EXPOSE 8000

#RUN ["python3", "-m", "vllm.entrypoints.openai.api_server"]

ENV PYTHONPATH=/workspace
WORKDIR /workspace

ENV LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH
ENV PATH=/usr/local/cuda-12.6/bin:$PATH
    
ENV CUDA_HOME=/usr/local/cuda-12.6

# Install llama-cpp-python with CUDA support for GGUF models
RUN CUDACXX=/usr/local/cuda-12.6/bin/nvcc CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all-major" FORCE_CMAKE=1  pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

#RUN CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --no-cache-dir

RUN pip cache purge

COPY . .

ENTRYPOINT ["python3", "test.py"]

Nov 03 '25 20:11 p1x31