paddler icon indicating copy to clipboard operation
paddler copied to clipboard

Stateful load balancer custom-tailored for llama.cpp

Results 29 paddler issues
Sort by recently updated
recently updated
newest added

llama.cpp exposes the `/health` endpoint, which makes it easy to deal with slots. What about other similar solutions?

enhancement
good first issue

See: https://github.com/distantmagic/paddler/discussions/16#discussioncomment-10401611 https://github.com/distantmagic/paddler/issues/6#issuecomment-2297674409 The lack of slot support can be worked around at the agent/observer level so that information does not have to be obtained just through llama.cpp.

enhancement

I noticed that it states that requests can queue when all llama.cpp instances are busy. I was wondering if the queuing is done per llama.cpp server or per slot? I...

question

It should be possible to always direct requests to a specific slot and distribute them among all the observed servers. Once a request is issued, all the following requests should...

enhancement

Looks like balancer is using the default value for the **reserverproxy-port** parameter even though **I configured a different value** in the command line. I started the balancer with this command,...

### Description Since the llama.cpp release [b3898](https://github.com/ggerganov/llama.cpp/releases/tag/b3898) the `/slots` API endpoint is secured with an API key if it's set. This is a documented breaking change: [changelog : llama-server REST...

paddler_slots_idle not working with lastest llama.cpp

bug

Say I want to add something to the prompt/query as it is transitioning thru the paddler? Can that be achieved?

## Description: It would be useful to simply download binary with your preferred packaged manager instead of building your own or downloading from [releases](https://github.com/distantmagic/paddler/releases). There should also exist some automated...

documentation
enhancement
windows
rust
macos