paddler
paddler copied to clipboard
Stateful load balancer custom-tailored for llama.cpp
llama.cpp exposes the `/health` endpoint, which makes it easy to deal with slots. What about other similar solutions?
See: https://github.com/distantmagic/paddler/discussions/16#discussioncomment-10401611 https://github.com/distantmagic/paddler/issues/6#issuecomment-2297674409 The lack of slot support can be worked around at the agent/observer level so that information does not have to be obtained just through llama.cpp.
I noticed that it states that requests can queue when all llama.cpp instances are busy. I was wondering if the queuing is done per llama.cpp server or per slot? I...
It should be possible to always direct requests to a specific slot and distribute them among all the observed servers. Once a request is issued, all the following requests should...
Looks like balancer is using the default value for the **reserverproxy-port** parameter even though **I configured a different value** in the command line. I started the balancer with this command,...
### Description Since the llama.cpp release [b3898](https://github.com/ggerganov/llama.cpp/releases/tag/b3898) the `/slots` API endpoint is secured with an API key if it's set. This is a documented breaking change: [changelog : llama-server REST...
paddler_slots_idle not working with lastest llama.cpp
Say I want to add something to the prompt/query as it is transitioning thru the paddler? Can that be achieved?
## Description: It would be useful to simply download binary with your preferred packaged manager instead of building your own or downloading from [releases](https://github.com/distantmagic/paddler/releases). There should also exist some automated...