Examples don't run with CUDA12
I have CUDA 12, I know it works because I ran custom pytorch model for other projects.
But when I try to run LlamaSharp.Examples with LLamaSharp.Backend.Cpu, it works fine. But when I try to use it with LLamaSharp.Backend.Cuda12, it crash right away with the following error:
System.TypeInitializationException
HResult=0x80131534
Message=The type initializer for 'LLama.Native.NativeApi' threw an exception.
Source=LLamaSharp
StackTrace:
at LLama.Native.NativeApi.llama_empty_call() in C:\work\Projects\LLamaSharp\LLama\Native\NativeApi.cs:line 27
at Program.<<Main>$>d__0.MoveNext() in C:\work\Projects\LLamaSharp\LLama.Examples\Program.cs:line 24
This exception was originally thrown at this call stack:
LLama.Native.NativeApi.NativeApi() in NativeApi.Load.cs
Inner Exception 1:
RuntimeError: The native library cannot be correctly loaded. It could be one of the following reasons:
1. No LLamaSharp backend was installed. Please search LLamaSharp.Backend and install one of them.
2. You are using a device with only CPU but installed cuda backend. Please install cpu backend instead.
3. One of the dependency of the native library is missed. Please use `ldd` on linux, `dumpbin` on windows and `otool`to check if all the dependency of the native library is satisfied. Generally you could find the libraries under your output folder.
4. Try to compile llama.cpp yourself to generate a libllama library, then use `LLama.Native.NativeLibraryConfig.WithLibrary` to specify it at the very beginning of your code. For more informations about compilation, please refer to LLamaSharp repo on github.
I tried running the project in Debug, also with GPU in the configuration manager, tried running it in .net 8, .net 6, and all combinations but always the same error. I am running the latest version for the nuget packages, 0.10.0.
You have probably added both backends CPU and CUDA. That is the reason for the crash. You need to remove one and thus keep only one backend.
Pretty sure I was testing it one at a time. I just to confirm I tested it again this morning, making sure GPU was alone:
Also to make sure, I deleted the bin and obj folders of the example project before testing again.
Same problem unfortunately.
llava_shared.dll is missing in the distribution for CUDA v12. Try to download it from llama.cpp and put it manually into the right runtime folder.
Took this file from llama.cpp: llama-b2418-bin-win-cublas-cu12.2.0-x64.zip, then got the llava_shared.dll file and put it in LLama.Examples\bin\Debug\net8.0\runtimes\win-x64\native\cuda12.
Same problem.
Try the right version maybe: https://github.com/ggerganov/llama.cpp/tree/d71ac90985854b0905e1abba778e407e17f9f887 The C++ dlls need to be compatible.
I will introduce the libraries in the Update Binary artifacts ASAP
Even despite todays update, this issue persists.
Hi, it could be confirmed as a BUG since it persists in v0.11.1. Could you please provide some information for us to find the problem? @KieranFoot @EtienneT
- What is your full cuda version?
- What is your CPU and GPU device? (It would be best if you follow this guide to print the cpu information)
- Are you using x86 or x64?
This seems to be fixed for me now in the latest version.
Thanks,
@AsakusaRinne Apologies, it isn't made clear in the repos docs that additional files are needed to use CUDA12. I assumed it would work out of the box as CUDA11 does.
Possibly the documentation could be improved to reflect this.
@KieranFoot Is it because you installed CUDA12 instead of CUDA11?
@AsakusaRinne I never installed CUDA11 manually, it just worked. So, when I switched the code to use CUDA12, I wrongly assumed it would also work out of the box.
I never installed CUDA11 manually, it just worked. So, when I switched the code to use CUDA12, I wrongly assumed it would also work out of the box.
It's weird that CUDA11 backend could work without CUDA installed. Have you ever installed cublas?
You need to update your display driver. Here is a reference: https://tech.amikelive.com/node-930/cuda-compatibility-of-nvidia-display-gpu-drivers/comment-page-1/
@martindevans If I'm not misunderstanding it, we could append some cublas files to the same folder of llama.dll to make it possible to run cuda backend without having cuda installed? As shown in llama.cpp releases, there's a compressed file named cudart-llama-bin-win-cu11.7.1-x64.zip, which contains cublas64_11.dll, cublasLt64_11.dll and cudart64_110.dll.
I don't know much about CUDA, but yes I think that would fix it (Onkitova tested it out in https://github.com/SciSharp/LLamaSharp/pull/371)
Last time we discussed this (ref) I think we decided they were too big to include in the main CUDA packages, but instead we could create another package which the CUDA packages depend on.
I don't know much about CUDA, but yes I think that would fix it (Onkitova tested it out in #371)
Last time we discussed this (ref) I think we decided they were too big to include in the main CUDA packages, but instead we could create another package which the CUDA packages depend on.
Yes, thank you for the clarification. I'll look into this issue. :)