llama-cpp-python llama-cpp-python 0.3.8 with CUDA

When will we have a recent version of llama-cpp-python, functional with CUDA, via pip? It's a real nightmare to make it work any other way. 0.3.4 works with CUDA, but it doesn't take into account models like Qwen 3 quantified.

with kind regards

May 01 '25 23:05 SeBL4RD

You can try to compile the new code I maintain here: https://github.com/JamePeng/llama-cpp-python, but I only pre-compiled the Windows and linux version based on the recent code

May 02 '25 08:05 JamePeng

When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?

I probably misunderstand your question, but I am using 0.3.9 with CUDA via pip. I can use Qwen3 related models as well (using Iquants for example or flash attention, is that what you are referring to?). Sorry if my comment is useless.

I built using... CMAKE_ARGS="-DGGML_CUDA=on -DLLAVA_BUILD=off -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

May 12 '25 16:05 m-from-space

Hi, I think it might come from the fact that wheels have not been updated to recent version of llama-cpp-python. You can see that of this link https://abetlen.github.io/llama-cpp-python/whl/cu122/llama-cpp-python/. Last version available is 3.4. Could you add newer wheels @abetlen ?

May 16 '25 08:05 michelonfrancoisSUMMIT

@m-from-space i have tried your solution and I got this error:

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [47 lines of output]
      *** scikit-build-core 0.11.3 using CMake 3.22.1 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpovycplbn/build/CMakeInit.txt
      -- The C compiler identification is Clang 14.0.0
      -- The CXX compiler identification is Clang 14.0.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/clang - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - failed
      -- Check for working CXX compiler: /usr/bin/clang++
      -- Check for working CXX compiler: /usr/bin/clang++ - broken
      CMake Error at /usr/share/cmake-3.22/Modules/CMakeTestCXXCompiler.cmake:62 (message):
        The C++ compiler
      
          "/usr/bin/clang++"
      
        is not able to compile a simple test program.
      
        It fails with the following output:
      
          Change Dir: /tmp/tmpovycplbn/build/CMakeFiles/CMakeTmp
      
          Run Build Command(s):ninja cmTC_9090e && [1/2] Building CXX object CMakeFiles/cmTC_9090e.dir/testCXXCompiler.cxx.o
          [2/2] Linking CXX executable cmTC_9090e
          FAILED: cmTC_9090e
          : && /usr/bin/clang++  -pthread   CMakeFiles/cmTC_9090e.dir/testCXXCompiler.cxx.o -o cmTC_9090e   && :
          /usr/bin/ld : ne peut pas trouver -lstdc++ : Aucun fichier ou dossier de ce nom
          clang: error: linker command failed with exit code 1 (use -v to see invocation)
          ninja: build stopped: subcommand failed.
      
      
      
      
      
        CMake will not be able to correctly generate this project.
      Call Stack (most recent call first):
        CMakeLists.txt:3 (project)
      
      
      -- Configuring incomplete, errors occurred!
      See also "/tmp/tmpovycplbn/build/CMakeFiles/CMakeOutput.log".
      See also "/tmp/tmpovycplbn/build/CMakeFiles/CMakeError.log".
      
      *** CMake configuration failed
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)

May 16 '25 08:05 michelonfrancoisSUMMIT

When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?

I probably misunderstand your question, but I am using 0.3.9 with CUDA via pip. I can use Qwen3 related models as well (using Iquants for example or flash attention, is that what you are referring to?). Sorry if my comment is useless.

I built using... CMAKE_ARGS="-DGGML_CUDA=on -DLLAVA_BUILD=off -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

Man... I've tried for hours to get similar commands to work, never succeeded, I used JamePeng's wheel to make up for it. I just tried your command and it works fine. I just don't get it. I don't understand these installations with CMAKE, and how it can work when Ableten hasn't done anything about it.

May 18 '25 23:05 SeBL4RD

Man... I've tried for hours to get similar commands to work, never succeeded, I used JamePeng's wheel to make up for it. I just tried your command and it works fine. I just don't get it. I don't understand these installations with CMAKE, and how it can work when Ableten hasn't done anything about it.

I'm glad that it worked for you. As far as I remember, it didn't build on my machine when LLAVA_BUILD isn't turned off, so maybe that's your problem as well.

@m-from-space i have tried your solution and I got this error:

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error
...
        The C++ compiler
      
          "/usr/bin/clang++"
      
        is not able to compile a simple test program.
...

I am no expert on this, but it looks like your c++ compiler is not set up correctly on your system. It fails to compile a simple test program. I am not using clang++ on my system, but gcc / g++ as the compilers.

Jun 07 '25 13:06 m-from-space

@m-from-space i have tried your solution and I got this error:
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error
...
        The C++ compiler
      
          "/usr/bin/clang++"
      
        is not able to compile a simple test program.
...
I am no expert on this, but it looks like your c++ compiler is not set up correctly on your system. It fails to compile a simple test program. I am not using clang++ on my system, but gcc / g++ as the compilers.

He's probably running on Windows. On Windows you need to install the C++ compiler via vs_BuildTools.exe

https://aka.ms/vs/17/release/vs_BuildTools.exe

Jun 07 '25 17:06 SeBL4RD