David Barton
David Barton
I think mine is running fully CPU based even though my GPU should be capable. Top was showing 900% and tokens were crawling out. Log shows `Use lib /home/david/software/mlc-llm/dist/lib/vicuna-v1-7b_vulkan_float16.so` vulcaninfo...
I had the same issue, with the error: `Error copying blocksync to the remote host!` however `--splay 2` fixed it. PS this is a brilliant tool @ct16k thanks for sharing!
@ct16k that fix doesn't always work. I have a replication that works 100% when run by hand with 6 workers and 100% fails on the first status update (I run...
Running with multiple workers definitely helps overcome network latency, even when your disks don't increase throughput with multiple workers.
> I have a replication that works 100% when run by hand with 6 workers and 100% fails on the first status update (I run with --interval=60 so it takes...
@theraser I have tried adding debug logging both on the sender and the receiver. The sender ends up failing with (lines may differ as I have added logging): ``` Traceback...
With some further digging, it looks like the sender process is attempting to write after the receiver thinks it is finished: ``` [worker 0] stderr: debug1: Sending environment.^M [worker 0]...
The error seems to be on the very last block: ``` [worker 0] Unexpected error on block 1195 ``` It loops over and writes every other block, and then the...