llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

enable rpc for server

Open steampunque opened this issue 1 year ago • 0 comments

I made a quick patch to server to test RPC running phi-3 fully offloaded onto a remote GPU with the server and all seemed OK, timings:

pp: 258.19 tokens per second tg: 48.41 tokens per second

Run locally on the same GPU as the remote machine gives:

pp: 563.30 tokens per second tg: 92.00 tokens per second

Possible Implementation

If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.

Patches are trivial:

printf("  --port PORT               port to listen (default  (default: %d)\n", sparams.port);

+ printf(" --rpc SERVERS comma separated list of RPC servers\n");

    } else if (arg == "--host") {
        if (++i >= argc) {
            invalid_param = true;
            break;
        }
        sparams.hostname = argv[i];

+ } else if (arg == "--rpc") { + if (++i >= argc) { + invalid_param = true; + break; + } + params.rpc_servers = argv[i];

steampunque avatar May 15 '24 02:05 steampunque