D-i-t-gh comments

Results 9 comments of


                                            D-i-t-gh

Feature Request: Tensor paralellism (--split-mode row) over rpc

Maybe looking at the source code of [distributed-llama](https://github.com/b4rtaz/distributed-llama) might help. I use it on multiple nodes (CPU only) and for now it seems to be the fastest solution (by far)...

Not able see Scaling performance with NuC (12th Gen) with deepseek_r1_distill_llama_8b_q40

How did you start the workers?

Feature Request: RPC offloading using a local model copy

segmentation fault

Hi, do you have enough free RAM on your systems? Dllama doesn't seem to check if the model will fit into RAM.

Eval bug: rpc backend surport cpu?

In order for rpc-server to split the model size, you have to set up more than 1 rpc-server afaik.

Eval bug: rpc backend surport cpu?

I use six nodes, this is the parameter for llama-cli `--rpc 192.168.0.150:50052,192.168.0.151:50052,192.168.0.152:50052,192.168.0.153:50052,192.168.0.154:50052,192.168.0.155:50052` It will split a 665GB model like this: ``` load_tensors: RPC[192.168.0.150:50052] model buffer size = 95096.06 MiB load_tensors:...