Can't install GPU version for windows for many times.
Issues
I am trying to install the lastest version of llama-cpp-python in my windows 11 with RTX-3090ti(24G). I have successfully installed llama-cpp-python=0.1.87 (can't exactly remember) months ago while using:
set FORCE_CMAKE=1
set CMAKE_ARGS=-DLLAMA_CUBLAS=on
pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
But when I want to access the latest version recently by using:
set CMAKE_ARGS="-DLLAMA_CUDA=on"
pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
After loading the model, it is still using CPU with BLAS=0 (or is another params = 1 instead of BLAS in new version?).
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 1024
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LAMMAFILE = 1 |
Model metadata: {'general.name': 'Meta-Llama-3-8B-Instruct-imatrix', 'general.architecture': 'llama', 'llama.block_count': '32', 'llama.context_length': '8192', 'tokenizer.ggml.eos_token_id': '128001', 'general.file_type': '18', 'llama.attention.head_count_kv': '8', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.head_count': '32', 'llama.rope.freq_base': '500000.000000', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.vocab_size': '128256', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.model': 'gpt2', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '128000', 'tokenizer.chat_template': "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}"}
Using gguf chat template: {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
I have been trying the pre-build wheel for CUDA 12.1 (pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121) and it still doesn't work. I add --verbose to see the output:
loading initial cache file C:\Users\Administrator\AppData\Local\Temp\tmpbbvy3nqu\build\CMakeInit.txt
-- Building for: Visual Studio 17 2022
-- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.22631.
-- The C compiler identification is MSVC 19.39.33523.0
-- The CXX compiler identification is MSVC 19.39.33523.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: F:/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: F:/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: F:/Git/cmd/git.exe (found version "2.44.0.windows.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- CMAKE_GENERATOR_PLATFORM: x64
-- x86 detected
-- Performing Test HAS_AVX_1
-- Performing Test HAS_AVX_1 - Success
-- Performing Test HAS_AVX2_1
-- Performing Test HAS_AVX2_1 - Success
-- Performing Test HAS_FMA_1
-- Performing Test HAS_FMA_1 - Success
-- Performing Test HAS_AVX512_1
-- Performing Test HAS_AVX512_1 - Failed
-- Performing Test HAS_AVX512_2
-- Performing Test HAS_AVX512_2 - Failed
Environment
python=3.12
C++ compiler: viusal studio 2022 (with necessary C++ modules)
cmake --version = 3.29.2
nvcc -V = CUDA 12.1 (while nvidia-smi cuda version is 12.3, i think it is not related to this issues)
I have been download and install VS2022, CUDA toolkit, cmake and anaconda, I am wondering if some steps are missing. Considering my previous experience there is no need to git clone this code and cd into it to build (Though I did that on my mac to build a pth file to bin file months ago).
My system variables are listed below:
- F:\Anaconda\Scripts
- F:\CMake\bin
- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin
Questions
- Are there some steps i have been missing for build the llama-cpp for GPU version?
- How to know if it is built for GPU version when i run
pip install llama-cpp-pythoninstead of loading model to checkBLAS=1. - Do I need to git clone this code and cd into some dir, create some file or dir to run the
pip install llama-cpp-py?
I checked #1352 , and is there an issus related to windows 11? I just thought it this the problem of my installation steps or my machine. Is there an official explaination plz !!!
Im having the same problem but with Linux, 20,04, using Kaggle Notebook, worked fine until yesterday.
edit: pip install llama-cpp-python==0.2.64 solves the problem.
Im having the same problem but with Linux, 20,04, using Kaggle Notebook, worked fine until yesterday.
edit:
pip install llama-cpp-python==0.2.64solves the problem.
Still not working, I have been trying 0.2.64, 0.2.60, 0.2.59 for many times and it seems to said:
Creating "ggml_shared.dir\Release\ggml_shared.tlog\unsuccessfulbuild" because "AlwaysCreate" was specified.
Touching "ggml_shared.dir\Release\ggml_shared.tlog\unsuccessfulbuild".
CustomBuild:
Building Custom Rule C:/Users/Administrator/AppData/Local/Temp/pip-install-_thkprn2/llama-cpp-python_9fa670d7909f4acfb3ac1882363d1df6/vendor/llama.cpp/CMakeLists.txt
The lama.dll is Win32 and we are 64 Bit on Windows 11, if I debug the C++ checker program in Win32 then the lama.dll loads successfully, but for 64 bit nope.
#include
int main() { // Update this path to the actual location of the llama_cpp.dll HINSTANCE hDLL = LoadLibrary(TEXT("C:\my_path\llama-cpp-python\llama_cpp\llama.dll"));
if (hDLL == NULL) {
std::cerr << "ERROR: unable to load DLL" << std::endl;
return 1;
}
std::cout << "DLL loaded successfully" << std::endl;
FreeLibrary(hDLL);
return 0;
}
The lama.dll is Win32 and we are 64 Bit on Windows 11, if I debug the C++ checker program in Win32 then the lama.dll loads successfully, but for 64 bit nope. #include #include <windows.h>
int main() { // Update this path to the actual location of the llama_cpp.dll HINSTANCE hDLL = LoadLibrary(TEXT("C:\my_path\llama-cpp-python\llama_cpp\llama.dll"));
if (hDLL == NULL) { std::cerr << "ERROR: unable to load DLL" << std::endl; return 1; } std::cout << "DLL loaded successfully" << std::endl; FreeLibrary(hDLL); return 0;} Well I think i can understand why, but I still don't know how to fix this problem, can you give more info or steps plz?
cuda version i'm using v12.4, windows 10 i think it will also work with windows 11
I have tried this from Windows PowerShell and it works for me
$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
$env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe"
pip install --upgrade --force-reinstall --no-cache-dir llama-cpp-python
I'm having the same issue. I have CUDA installed, nvcc works, and CUDA_PATH is set. Doing: set CMAKE_ARGS=-DLLAMA_CUDA=ON pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
Don't see any errors in the installation. Yet when I run it I get BLAS = 0:
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
Getting the same result with: set CMAKE_ARGS=-DLLAMA_CUBLAS=ON pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
cuda version i'm using
v12.4,windows 10i think it will also work with windows 11I have tried this from Windows PowerShell and it works for me
$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on" $env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe" pip install --upgrade --force-reinstall --no-cache-dir llama-cpp-python
It seems to be the problem of windows 11, i made it worked in windows 10 months ago (while the version of llama-cpp-python==0.1.72), but when i turn into the latest version with win11 it doesn't work :(
I'm having the same issue. I have CUDA installed, nvcc works, and CUDA_PATH is set. Doing: set CMAKE_ARGS=-DLLAMA_CUDA=ON pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
Don't see any errors in the installation. Yet when I run it I get BLAS = 0:
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
Getting the same result with: set CMAKE_ARGS=-DLLAMA_CUBLAS=ON pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
Yeap, if you make it work plz let me know :) I will keep trying to find the solution as well.
Although it didn't work initially, I was able to download the prebuilt wheel and it works, now I am getting GPU inference. It does seem like there is an issue with my environment in some way.