paddler icon indicating copy to clipboard operation
paddler copied to clipboard

llama.cpp supervisor

Open mcharytoniuk opened this issue 1 year ago • 5 comments

[draft]

General idea:

There should be a tool to manage existing llama.cpp instances. Not all parameters in llama.cpp can be changed on runtime, it should also be brought up when it exits for any reason.

Paddler can potentially download and manage llama.cpp version that it supports.

For example:

paddler supervisor \
    --llama-server-path ./llama-server \
    --supervisor-aggregate-addr 127.0.0.1:8085 \ # reports llamacpp status to that server
    --supervisor-controller-addr 127.0.0.1:8089 # exposes API to manage a specific llamacpp instance
paddler download # downloads the latest supported llama.cpp version

Then, it should be possible to restart that specific llama.cpp instance through supervisor-controller-addr API

Paddler should keep requests on hold while supervisor restarts llamacpp instances

mcharytoniuk avatar Dec 20 '24 08:12 mcharytoniuk

So the initial usage flow and rules might be:

1 - User will use some command to start a new Supervisor instance pointing to an existing llamacpp instance.

2 - User will control llamacpp instance through Supervisor Rest Api.

3 - Changing llamacpp instance configuration will make the Supervisor restart the llamacpp instance with the new configuration options applied. Supervisor will restart the llamacpp instance with the initial llamacpp address.

4 - Llamacpp address can also be changed in the Supervisor Rest api. If old running Agents are broken with new llamacpp address is on the user responsibility.

5 - While restarting llamacpp instances, reverseproxy Loadbalancer must not drop incoming requests to llamacpp instances.

Propfend avatar Dec 20 '24 12:12 Propfend

Should Supervisor be optional?

If the basic Paddler ecosystem can work without supervisor, just with balancer, llamacpp and some agent instance, should the supervisor have an optional compilation?

Propfend avatar Jan 07 '25 14:01 Propfend

Supervisor aggregate address

Its not clear for me its purpose. Whats the point of supervisor-aggregate-addr arg? would 8085 be the loadbalancer management port server? Why would supervisor report llamacpp status to loadbalancer management if agents already do so? you mean its status as an OS process?

Propfend avatar Jan 07 '25 14:01 Propfend

Paddler binaries downloading

Ideas, Suggestions or any more details from the community on the:

Paddler can potentially download and manage llama.cpp version that it supports.

paddler download # downloads the latest supported llama.cpp version

behavior?

Propfend avatar Jan 07 '25 15:01 Propfend

Should Supervisor be optional?

If the basic Paddler ecosystem can work without supervisor, just with balancer, llamacpp and some agent instance, should the supervisor have an optional compilation?

Nope, because it doesn't introduce additional build requirements. I've made web GUI optional because it introduced node as a dependency to build the front-end; I wanted Paddler to have a way to be built with just Rust. Even though the supervisor is optional, it does not require Node or anything like that.

mcharytoniuk avatar Feb 11 '25 14:02 mcharytoniuk

We won't be doing that in the end, we compiled-in llama.cpp in 2.0 in the end: https://github.com/intentee/paddler/releases/tag/v2.0.0

mcharytoniuk avatar Aug 08 '25 20:08 mcharytoniuk