LLamaSharp Examples don't run with CUDA12

I have CUDA 12, I know it works because I ran custom pytorch model for other projects.

But when I try to run LlamaSharp.Examples with LLamaSharp.Backend.Cpu, it works fine. But when I try to use it with LLamaSharp.Backend.Cuda12, it crash right away with the following error:

System.TypeInitializationException
  HResult=0x80131534
  Message=The type initializer for 'LLama.Native.NativeApi' threw an exception.
  Source=LLamaSharp
  StackTrace:
   at LLama.Native.NativeApi.llama_empty_call() in C:\work\Projects\LLamaSharp\LLama\Native\NativeApi.cs:line 27
   at Program.<<Main>$>d__0.MoveNext() in C:\work\Projects\LLamaSharp\LLama.Examples\Program.cs:line 24

  This exception was originally thrown at this call stack:
    LLama.Native.NativeApi.NativeApi() in NativeApi.Load.cs

Inner Exception 1:
RuntimeError: The native library cannot be correctly loaded. It could be one of the following reasons: 
1. No LLamaSharp backend was installed. Please search LLamaSharp.Backend and install one of them. 
2. You are using a device with only CPU but installed cuda backend. Please install cpu backend instead. 
3. One of the dependency of the native library is missed. Please use `ldd` on linux, `dumpbin` on windows and `otool`to check if all the dependency of the native library is satisfied. Generally you could find the libraries under your output folder.
4. Try to compile llama.cpp yourself to generate a libllama library, then use `LLama.Native.NativeLibraryConfig.WithLibrary` to specify it at the very beginning of your code. For more informations about compilation, please refer to LLamaSharp repo on github.

I tried running the project in Debug, also with GPU in the configuration manager, tried running it in .net 8, .net 6, and all combinations but always the same error. I am running the latest version for the nuget packages, 0.10.0.

Mar 13 '24 17:03 EtienneT

You have probably added both backends CPU and CUDA. That is the reason for the crash. You need to remove one and thus keep only one backend.

Mar 14 '24 14:03 zsogitbe

Pretty sure I was testing it one at a time. I just to confirm I tested it again this morning, making sure GPU was alone:

Also to make sure, I deleted the bin and obj folders of the example project before testing again.

Same problem unfortunately.

Mar 14 '24 14:03 EtienneT

llava_shared.dll is missing in the distribution for CUDA v12. Try to download it from llama.cpp and put it manually into the right runtime folder.

Mar 14 '24 14:03 zsogitbe

Took this file from llama.cpp: llama-b2418-bin-win-cublas-cu12.2.0-x64.zip, then got the llava_shared.dll file and put it in LLama.Examples\bin\Debug\net8.0\runtimes\win-x64\native\cuda12.

Same problem.

Mar 14 '24 14:03 EtienneT

Try the right version maybe: https://github.com/ggerganov/llama.cpp/tree/d71ac90985854b0905e1abba778e407e17f9f887 The C++ dlls need to be compatible.

Mar 14 '24 14:03 zsogitbe

I will introduce the libraries in the Update Binary artifacts ASAP

Mar 14 '24 21:03 SignalRT

Even despite todays update, this issue persists.

Apr 01 '24 01:04 KieranFoot

Hi, it could be confirmed as a BUG since it persists in v0.11.1. Could you please provide some information for us to find the problem? @KieranFoot @EtienneT

What is your full cuda version?
What is your CPU and GPU device? (It would be best if you follow this guide to print the cpu information)
Are you using x86 or x64?

Apr 01 '24 06:04 SanftMonster

This seems to be fixed for me now in the latest version.

Thanks,

Apr 01 '24 15:04 EtienneT

@AsakusaRinne Apologies, it isn't made clear in the repos docs that additional files are needed to use CUDA12. I assumed it would work out of the box as CUDA11 does.

Possibly the documentation could be improved to reflect this.

Apr 11 '24 22:04 KieranFoot

@KieranFoot Is it because you installed CUDA12 instead of CUDA11?

Apr 12 '24 06:04 SanftMonster

@AsakusaRinne I never installed CUDA11 manually, it just worked. So, when I switched the code to use CUDA12, I wrongly assumed it would also work out of the box.

Apr 12 '24 08:04 KieranFoot

I never installed CUDA11 manually, it just worked. So, when I switched the code to use CUDA12, I wrongly assumed it would also work out of the box.

It's weird that CUDA11 backend could work without CUDA installed. Have you ever installed cublas?

Apr 12 '24 10:04 SanftMonster

You need to update your display driver. Here is a reference: https://tech.amikelive.com/node-930/cuda-compatibility-of-nvidia-display-gpu-drivers/comment-page-1/

Apr 12 '24 11:04 zsogitbe

@martindevans If I'm not misunderstanding it, we could append some cublas files to the same folder of llama.dll to make it possible to run cuda backend without having cuda installed? As shown in llama.cpp releases, there's a compressed file named cudart-llama-bin-win-cu11.7.1-x64.zip, which contains cublas64_11.dll, cublasLt64_11.dll and cudart64_110.dll.

Apr 12 '24 15:04 SanftMonster

I don't know much about CUDA, but yes I think that would fix it (Onkitova tested it out in https://github.com/SciSharp/LLamaSharp/pull/371)

Last time we discussed this (ref) I think we decided they were too big to include in the main CUDA packages, but instead we could create another package which the CUDA packages depend on.

Apr 12 '24 16:04 martindevans

I don't know much about CUDA, but yes I think that would fix it (Onkitova tested it out in #371)

Last time we discussed this (ref) I think we decided they were too big to include in the main CUDA packages, but instead we could create another package which the CUDA packages depend on.

Yes, thank you for the clarification. I'll look into this issue. :)

Apr 12 '24 18:04 SanftMonster