I've just successfully installed it! Here's the information for your reference.

PowerShell ：

$env:CUDA_TOOLKIT_ROOT_DIR="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6" $env:CMAKE_GENERATOR_PLATFORM="x64" $env:FORCE_CMAKE="1" $env:CMAKE_ARGS="-DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89" pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

(base) PS C:\Users\Administrator\source\repos> conda activate CUDA126-py312 (CUDA126-py312) PS C:\Users\Administrator\source\repos> $env:CUDA_TOOLKIT_ROOT_DIR="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6" (CUDA126-py312) PS C:\Users\Administrator\source\repos> $env:CMAKE_GENERATOR_PLATFORM="x64" (CUDA126-py312) PS C:\Users\Administrator\source\repos> $env:FORCE_CMAKE="1" (CUDA126-py312) PS C:\Users\Administrator\source\repos> $env:CMAKE_ARGS="-DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89" (CUDA126-py312) PS C:\Users\Administrator\source\repos> pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple/, http://mirrors.aliyun.com/pypi/simple/ Collecting llama-cpp-python Downloading http://mirrors.aliyun.com/pypi/packages/a6/38/7a47b1fb1d83eaddd86ca8ddaf20f141cbc019faf7b425283d8e5ef710e5/llama_cpp_python-0.3.7.tar.gz (66.7 MB) ---------------------------------------- 66.7/66.7 MB 22.7 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting typing-extensions>=4.5.0 (from llama-cpp-python) Downloading http://mirrors.aliyun.com/pypi/packages/26/9f/ad63fc0248c5379346306f8668cda6e2e2e9c95e01216d2b8ffd9ff037d0/typing_extensions-4.12.2-py3-none-any.whl (37 kB) Collecting numpy>=1.20.0 (from llama-cpp-python) Downloading http://mirrors.aliyun.com/pypi/packages/42/6e/55580a538116d16ae7c9aa17d4edd56e83f42126cb1dfe7a684da7925d2c/numpy-2.2.3-cp312-cp312-win_amd64.whl (12.6 MB) ---------------------------------------- 12.6/12.6 MB 23.3 MB/s eta 0:00:00 Collecting diskcache>=5.6.1 (from llama-cpp-python) Downloading http://mirrors.aliyun.com/pypi/packages/3f/27/4570e78fc0bf5ea0ca45eb1de3818a23787af9b390c0b0a0033a1b8236f9/diskcache-5.6.3-py3-none-any.whl (45 kB) Collecting jinja2>=2.11.3 (from llama-cpp-python) Downloading http://mirrors.aliyun.com/pypi/packages/bd/0f/2ba5fbcd631e3e88689309dbe978c5769e883e4b84ebfe7da30b43275c5a/jinja2-3.1.5-py3-none-any.whl (134 kB) Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama-cpp-python) Downloading http://mirrors.aliyun.com/pypi/packages/c1/80/a61f99dc3a936413c3ee4e1eecac96c0da5ed07ad56fd975f1a9da5bc630/MarkupSafe-3.0.2-cp312-cp312-win_amd64.whl (15 kB) Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... done Created wheel for llama-cpp-python: filename=llama_cpp_python-0.3.7-cp312-cp312-win_amd64.whl size=93613512 sha256=cd98aae040b2dbcc1f4653370900de27455ef65275d08543da81c53c31138a1a Stored in directory: C:\Users\Administrator\AppData\Local\Temp\pip-ephem-wheel-cache-9usio9a1\wheels\ec\61\fc\cee068315610d77f6a99c0032a74e4c8cb21c1d6e281b45bb5 Successfully built llama-cpp-python Installing collected packages: typing-extensions, numpy, MarkupSafe, diskcache, jinja2, llama-cpp-python Successfully installed MarkupSafe-3.0.2 diskcache-5.6.3 jinja2-3.1.5 llama-cpp-python-0.3.7 numpy-2.2.3 typing-extensions-4.12.2 (CUDA126-py312) PS C:\Users\Administrator\source\repos>

G:>conda.bat activate CUDA126-py312 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4070 SUPER, compute capability 8.9, VMM: yes llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4070 SUPER) - 11053 MiB free llama_model_loader: loaded meta data with 29 key-value pairs and 579 tensors from E:.lmstudio\models\Qwen\Qwen2.5-Coder-14B-Instruct-GGUF\qwen2.5-coder-14b-instruct-q4_k_m.gguf (version GGUF V3 (latest))

Feb 26 '25 14:02 dw5189

帅哥，大佬，可以帮忙编译一版cuda12.4.1 python=3.10 ubuntu20.04的0.3.7的预编译文件么。搞了几天了都成功不了，叩谢叩谢

Mar 01 '25 05:03 tianxiajianghu

帅哥，大佬，可以帮忙编译一版cuda12.4.1 python=3.10 ubuntu20.04的0.3.7的预编译文件么。搞了几天了都成功不了，叩谢叩谢

sudo add-apt-repository ppa:deadsnakes/ppa -y sudo apt update

sudo apt install -y
python3.10
python3.10-dev
python3.10-venv
build-essential
cmake
git
wget
libssl-dev

下载密钥并添加

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb

更新仓库列表

sudo apt update

#安装 CUDA 12.4 sudo apt install -y cuda-toolkit-12-4 [ .............. done. done. Processing triggers for mime-support (3.64ubuntu1) ... Setting up openjdk-11-jre-headless:amd64 (11.0.26+4-1ubuntu1~20.04) ... update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/java to provide /usr/bin/java (java) in auto mode update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/jjs to provide /usr/bin/jjs (jjs) in auto mode update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/keytool to provide /usr/bin/keytool (keytool) in auto mode update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/rmid to provide /usr/bin/rmid (rmid) in auto mode update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/rmiregistry to provide /usr/bin/rmiregistry (rmiregist ry) in auto mode update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/pack200 to provide /usr/bin/pack200 (pack200) in auto mode update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/unpack200 to provide /usr/bin/unpack200 (unpack200) in auto mode update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/lib/jexec to provide /usr/bin/jexec (jexec) in auto mode Setting up openjdk-11-jre:amd64 (11.0.26+4-1ubuntu1~20.04) ... Setting up openjdk-11-jre:amd64 (11.0.26+4-1ubuntu1~20.04) ... Setting up default-jre (2:1.11-72) ... Setting up cuda-nsight-12-4 (12.4.127-1) ... Setting up cuda-nvvp-12-4 (12.4.127-1) ... Setting up cuda-visual-tools-12-4 (12.4.1-1) ... Setting up cuda-tools-12-4 (12.4.1-1) ... Setting up cuda-toolkit-12-4 (12.4.1-1) ... ]

echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc source ~/.bashrc

CMAKE_ARGS="-DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.4/bin/nvcc"
FORCE_CMAKE=1
pip install --verbose llama-cpp-python==0.3.7

清理旧编译缓存

pip cache purge

重新编译安装

CMAKE_ARGS="-DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.4/bin/nvcc"
FORCE_CMAKE=1
pip install --verbose --no-cache-dir llama-cpp-python==0.3.7

(py310-cuda12.41) dw5189@DESKTOP-RS0JQBN:~$ CMAKE_ARGS="-DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.4/bin/nvcc" \

FORCE_CMAKE=1
pip install --verbose --no-cache-dir llama-cpp-python==0.3.7 Using pip 25.0.1 from /home/dw5189/py310-cuda12.41/lib/python3.10/site-packages/pip (python 3.10) Collecting llama-cpp-python==0.3.7 Downloading llama_cpp_python-0.3.7.tar.gz (66.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.7/66.7 MB 12.2 MB/s eta 0:00:00 Running command pip subprocess to install build dependencies Using pip 25.0.1 from /home/dw5189/py310-cuda12.41/lib/python3.10/site-packages/pip (python 3.10) Collecting scikit-build-core>=0.9.2 (from scikit-build-core[pyproject]>=0.9.2) Obtaining dependency information for scikit-build-core>=0.9.2 from https://files.pythonhosted.org/packages/0a/ba/b37b9802f503894a46ef34aaa5851344cde48b39ab0af5057a6ee4f0d631/scikit_build_core-0.11.0-py3-none-any.whl.metadata Downloading scikit_build_core-0.11.0-py3-none-any.whl.metadata (21 kB) Collecting exceptiongroup>=1.0 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2) Obtaining dependency information for exceptiongroup>=1.0 from https://files.pythonhosted.org/packages/02/cc/b7e31358aac6ed1ef2bb790a9746ac2c69bcb3c8588b41616914eb106eaf/exceptiongroup-1.2.2-py3-none-any.whl.metadata Downloading exceptiongroup-1.2.2-py3-none-any.whl.metadata (6.6 kB) Collecting packaging>=21.3 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2) Obtaining dependency information for packaging>=21.3 from https://files.pythonhosted.org/packages/88/ef/eb23f262cca3c0c4eb7ab1933c3b1f03d021f2c48f54763065b6f0e321be/packaging-24.2-py3-none-any.whl.metadata Downloading packaging-24.2-py3-none-any.whl.metadata (3.2 kB) Collecting pathspec>=0.10.1 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2) Obtaining dependency information for pathspec>=0.10.1 from https://files.pythonhosted.org/packages/cc/20/ff623b09d963f88bfde16306a54e12ee5ea43e9b597108672ff3a408aad6/pathspec-0.12.1-py3-none-any.whl.metadata Downloading pathspec-0.12.1-py3-none-any.whl.metadata (21 kB) Collecting tomli>=1.2.2 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2) Obtaining dependency information for tomli>=1.2.2 from https://files.pythonhosted.org/packages/6e/c2/61d3e0f47e2b74ef40a68b9e6ad5984f6241a942f7cd3bbfbdbd03861ea9/tomli-2.2.1-py3-none-any.whl.metadata Downloading tomli-2.2.1-py3-none-any.whl.metadata (10 kB) Downloading scikit_build_core-0.11.0-py3-none-any.whl (179 kB) Downloading exceptiongroup-1.2.2-py3-none-any.whl (16 kB) Downloading packaging-24.2-py3-none-any.whl (65 kB) Downloading pathspec-0.12.1-py3-none-any.whl (31 kB) Downloading tomli-2.2.1-py3-none-any.whl (14 kB) Installing collected packages: tomli, pathspec, packaging, exceptiongroup, scikit-build-core Successfully installed exceptiongroup-1.2.2 packaging-24.2 pathspec-0.12.1 scikit-build-core-0.11.0 tomli-2.2.1 Installing build dependencies ... done Running command Getting requirements to build wheel Getting requirements to build wheel ... done Running command pip subprocess to install backend dependencies Using pip 25.0.1 from /home/dw5189/py310-cuda12.41/lib/python3.10/site-packages/pip (python 3.10) Collecting cmake>=3.21 Obtaining dependency information for cmake>=3.21 from https://files.pythonhosted.org/packages/59/e8/096984b89133681533650b9078c5ed1c5c9b534e869b5487f22d4de1935c/cmake-3.31.6-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata Downloading cmake-3.31.6-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.3 kB) Collecting ninja>=1.5 Obtaining dependency information for ninja>=1.5 from https://files.pythonhosted.org/packages/6b/35/a8e38d54768e67324e365e2a41162be298f51ec93e6bd4b18d237d7250d8/ninja-1.11.1.3-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata Downloading ninja-1.11.1.3-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.3 kB) Downloading cmake-3.31.6-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.8/27.8 MB 8.9 MB/s eta 0:00:00 Downloading ninja-1.11.1.3-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB) Installing collected packages: ninja, cmake changing mode of /tmp/pip-build-env-e74ao7js/normal/bin/ccmake to 775 changing mode of /tmp/pip-build-env-e74ao7js/normal/bin/cmake to 775 changing mode of /tmp/pip-build-env-e74ao7js/normal/bin/cpack to 775 changing mode of /tmp/pip-build-env-e74ao7js/normal/bin/ctest to 775 Successfully installed cmake-3.31.6 ninja-1.11.1.3 Installing backend dependencies ... done Running command Preparing metadata (pyproject.toml) *** scikit-build-core 0.11.0 using CMake 3.31.6 (metadata_wheel) Preparing metadata (pyproject.toml) ... done Collecting typing-extensions>=4.5.0 (from llama-cpp-python==0.3.7) Obtaining dependency information for typing-extensions>=4.5.0 from https://files.pythonhosted.org/packages/26/9f/ad63fc0248c5379346306f8668cda6e2e2e9c95e01216d2b8ffd9ff037d0/typing_extensions-4.12.2-py3-none-any.whl.metadata Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB) Collecting numpy>=1.20.0 (from llama-cpp-python==0.3.7) Obtaining dependency information for numpy>=1.20.0 from https://files.pythonhosted.org/packages/e9/88/3870cfa9bef4dffb3a326507f430e6007eeac258ebeef6b76fc542aef66d/numpy-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata Downloading numpy-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB) Collecting diskcache>=5.6.1 (from llama-cpp-python==0.3.7) Obtaining dependency information for diskcache>=5.6.1 from https://files.pythonhosted.org/packages/3f/27/4570e78fc0bf5ea0ca45eb1de3818a23787af9b390c0b0a0033a1b8236f9/diskcache-5.6.3-py3-none-any.whl.metadata Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB) Collecting jinja2>=2.11.3 (from llama-cpp-python==0.3.7) Obtaining dependency information for jinja2>=2.11.3 from https://files.pythonhosted.org/packages/bd/0f/2ba5fbcd631e3e88689309dbe978c5769e883e4b84ebfe7da30b43275c5a/jinja2-3.1.5-py3-none-any.whl.metadata Downloading jinja2-3.1.5-py3-none-any.whl.metadata (2.6 kB) Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama-cpp-python==0.3.7) Obtaining dependency information for MarkupSafe>=2.0 from https://files.pythonhosted.org/packages/22/35/137da042dfb4720b638d2937c38a9c2df83fe32d20e8c8f3185dbfef05f7/MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata Downloading MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB) Downloading diskcache-5.6.3-py3-none-any.whl (45 kB) Downloading jinja2-3.1.5-py3-none-any.whl (134 kB) Downloading numpy-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.4/16.4 MB 13.4 MB/s eta 0:00:00 Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB) Downloading MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20 kB) Building wheels for collected packages: llama-cpp-python Running command Building wheel for llama-cpp-python (pyproject.toml) *** scikit-build-core 0.11.0 using CMake 3.31.6 (wheel) *** Configuring CMake... loading initial cache file /tmp/tmpjlaiut9e/build/CMakeInit.txt -- The C compiler identification is GNU 9.4.0 -- The CXX compiler identification is GNU 9.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/x86_64-linux-gnu-gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/x86_64-linux-gnu-g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: /usr/bin/git (found version "2.25.1") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- Including CPU backend -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- x86 detected -- Adding CPU backend variant ggml-cpu: -march=native -- Found CUDAToolkit: /usr/local/cuda-12.4/targets/x86_64-linux/include (found version "12.4.131") -- CUDA Toolkit found -- Using CUDA architectures: native -- The CUDA compiler identification is NVIDIA 12.4.131 with host compiler GNU 9.4.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda-12.4/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- CUDA host compiler is GNU 9.4.0

-- Including CUDA backend CMake Warning at vendor/llama.cpp/ggml/CMakeLists.txt:285 (message): GGML build version fixed at 1 likely due to a shallow clone.

...............................

*** Making wheel... *** Created llama_cpp_python-0.3.7-cp310-cp310-linux_x86_64.whl Building wheel for llama-cpp-python (pyproject.toml) ... done Created wheel for llama-cpp-python: filename=llama_cpp_python-0.3.7-cp310-cp310-linux_x86_64.whl size=40889866 sha256=503a77c7bdd70e6a553dc5e31b8cc475edb5c95681bc96765aeb6744472a9c85 Stored in directory: /tmp/pip-ephem-wheel-cache-0u5zrgg2/wheels/5c/8f/58/a39eb13258f3bbf64bb36ed76d31979579a6f175be38de06b7 Successfully built llama-cpp-python Installing collected packages: typing-extensions, numpy, MarkupSafe, diskcache, jinja2, llama-cpp-python changing mode of /home/dw5189/py310-cuda12.41/bin/f2py to 775 changing mode of /home/dw5189/py310-cuda12.41/bin/numpy-config to 775 Successfully installed MarkupSafe-3.0.2 diskcache-5.6.3 jinja2-3.1.5 llama-cpp-python-0.3.7 numpy-2.2.3 typing-extensions-4.12.2 (py310-cuda12.41) dw5189@DESKTOP-RS0JQBN:~$

Mar 01 '25 21:03 dw5189

帅哥，大佬，可以帮忙编译一版cuda12.4.1 python=3.10 ubuntu20.04的0.3.7的预编译文件么。搞了几天了都成功不了，叩谢叩谢

安装编译依赖 sudo apt update && sudo apt install -y
python3.10-dev
build-essential
cmake
git
wget
libssl-dev
libopenblas-dev # 加速数学库

下载密钥并添加添加 NVIDIA CUDA 仓库

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb

更新仓库列表

sudo apt update

安装 CUDA 12.4

sudo apt install -y cuda-toolkit-12-4

清理旧编译缓存

pip cache purge

编辑 .bashrc 或虚拟环境激活脚本（如果使用虚拟环境）

echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc source ~/.bashrc

重新编译安装

CMAKE_ARGS="-DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.4/bin/nvcc"
FORCE_CMAKE=1
pip install --verbose --no-cache-dir llama-cpp-python==0.3.7

Mar 01 '25 21:03 dw5189

successfully installed !! CUDA/v12.6 !! Visual Studio 2022!!

下载密钥并添加

更新仓库列表

清理旧编译缓存

重新编译安装

下载密钥并添加 添加 NVIDIA CUDA 仓库

更新仓库列表

安装 CUDA 12.4

清理旧编译缓存

编辑 .bashrc 或虚拟环境激活脚本（如果使用虚拟环境）

重新编译安装

下载密钥并添加添加 NVIDIA CUDA 仓库