KoboldCpp support
Added Support for KoboldCpp, which provides an easy way to run LLM models on Windows, Linux hosts by a single executable. Uses llama.cpp under the hood
I would like to ask, what are the advantages that we are getting from KoboldCpp over ollama or even llama-cpp-python ?
llama-cpp-python also supports multiple OS & also GGUF and GGML models.
Thanks in Advance!
See the main advantage is the ease-of-use as it does not require any installation. It supports Windows and Linux. (Ollama for Windows is a mixed bag when it comes to support) It has Hardware Acceleration for Nvidia(CUDA) as well AMD GPUs through Vulkan Since, its a port of llama.cpp all of its features are present and can be configured in GUI format.
Overall it comes down to ease of install and use, I feel KoboldCpp is better especially when considering Windows.
Answering your question about using llama-cpp-python, it is slower as compared to llama.cpp due to well Python and does not provide a CUDA build by default.
Thank you..!
Ollama also has hardware acceleration for Nvidia GPUs and AND GPUs as well. I agree Ollama for Windows is in early stages.
Coming to llama-cpp-python , it supports CUDA, METAL by default, I have been using it for past few months and the experience is seamless. You just have to specify gpu layers to enable GPU acceleration.
Coming to the overhead, it is very minimal, that's why It has been used in all major libraries such as Langchain, Llamaindex etc.
Thank you !
First of all, I agree that Nvidia GPUs are very well supported throughout because of CUDA but when it comes down to AMD GPUs saying that they are "well" or even supported is an bold statement as the ROCm backend is not really an option to develop unless you own the latest and greatest GPU. Not to mention the pain to install. So, if you own an AMD GPU using Vulkan is your best bet. I would love if someone can prove me otherwise but it is what it is for the time being. (PS - sorry for the rant)
Furthermore, It's not only AMD, since Vulkan is an Open API, in contrast to CUDA and METAL, which is mature and well supported throughout it enables Hardware Acceleration to run on Intel GPUs and ARM Chips.
Coming to why support KoboldCpp, well since it's easy to setup while Ollama and Windows support matures it provides and alternative for using Devika. Also, since it uses only http requests for text generation it will be easier to support other backends which only exposes their REST API instead of waiting for someone to create a library to handle that.
Also, it's the spirit of Open Source to give user the flexibility to do what they want, more options to do anything are always welcomed.
Thanks ✌️
This will be reopened, we're considering whether or not KoboldCpp should be integrated. Thanks for the PR! ❤️🙌