llama.cpp supervisor
[draft]
General idea:
There should be a tool to manage existing llama.cpp instances. Not all parameters in llama.cpp can be changed on runtime, it should also be brought up when it exits for any reason.
Paddler can potentially download and manage llama.cpp version that it supports.
For example:
paddler supervisor \
--llama-server-path ./llama-server \
--supervisor-aggregate-addr 127.0.0.1:8085 \ # reports llamacpp status to that server
--supervisor-controller-addr 127.0.0.1:8089 # exposes API to manage a specific llamacpp instance
paddler download # downloads the latest supported llama.cpp version
Then, it should be possible to restart that specific llama.cpp instance through supervisor-controller-addr API
Paddler should keep requests on hold while supervisor restarts llamacpp instances
So the initial usage flow and rules might be:
1 - User will use some command to start a new Supervisor instance pointing to an existing llamacpp instance.
2 - User will control llamacpp instance through Supervisor Rest Api.
3 - Changing llamacpp instance configuration will make the Supervisor restart the llamacpp instance with the new configuration options applied. Supervisor will restart the llamacpp instance with the initial llamacpp address.
4 - Llamacpp address can also be changed in the Supervisor Rest api. If old running Agents are broken with new llamacpp address is on the user responsibility.
5 - While restarting llamacpp instances, reverseproxy Loadbalancer must not drop incoming requests to llamacpp instances.
Should Supervisor be optional?
If the basic Paddler ecosystem can work without supervisor, just with balancer, llamacpp and some agent instance, should the supervisor have an optional compilation?
Supervisor aggregate address
Its not clear for me its purpose. Whats the point of supervisor-aggregate-addr arg? would 8085 be the loadbalancer management port server? Why would supervisor report llamacpp status to loadbalancer management if agents already do so? you mean its status as an OS process?
Paddler binaries downloading
Ideas, Suggestions or any more details from the community on the:
Paddler can potentially download and manage llama.cpp version that it supports.
paddler download # downloads the latest supported llama.cpp version
behavior?
Should Supervisor be optional?
If the basic Paddler ecosystem can work without supervisor, just with balancer, llamacpp and some agent instance, should the supervisor have an optional compilation?
Nope, because it doesn't introduce additional build requirements. I've made web GUI optional because it introduced node as a dependency to build the front-end; I wanted Paddler to have a way to be built with just Rust. Even though the supervisor is optional, it does not require Node or anything like that.
We won't be doing that in the end, we compiled-in llama.cpp in 2.0 in the end: https://github.com/intentee/paddler/releases/tag/v2.0.0