nccl-fastsocket icon indicating copy to clipboard operation
nccl-fastsocket copied to clipboard

What does "NCCL WARN Cannot get incoming CPU" mean?

Open nvlcambier opened this issue 3 years ago • 0 comments

I am trying out the fastsocket NCCL plugin on GCP (specifically a GCE SLURM cluster build out of 2x(8xA100) nodes with gVNIC's). I see those warnings in the logs, specifically NCCL WARN Cannot get incoming CPU. and NCCL WARN Maximum retry reached for accept 3.. Does that mean something specific or can it be safely ignored?

The code runs despite the warning, although performance with/without the plugin look very similar.

full-debug2-test-1:4024:4048 [0] net_fastsocket.cc:765 NCCL WARN Cannot get incoming CPU.

full-debug2-test-0:4300:4325 [0] net_fastsocket.cc:785 NCCL WARN Maximum retry reached for accept 3.

full-debug2-test-1:4024:4055 [0] net_fastsocket.cc:674 NCCL WARN Maximum retry reached for connect 3.
full-debug2-test-0:4300:4325 [0] NCCL INFO accept qid: 3, rqid: 3
full-debug2-test-0:4300:4325 [0] NCCL INFO accept incoming cpu: 0
full-debug2-test-0:4300:4325 [0] NCCL INFO NET/FastSocket : Connected after 1000 retries.
full-debug2-test-0:4300:4325 [0] NCCL INFO NET/FastSocket : Accepted data socket 3

full-debug2-test-0:4300:4348 [0] net_fastsocket.cc:652 NCCL WARN Cannot get incoming CPU.
full-debug2-test-1:4024:4055 [0] NCCL INFO connect incoming cpu: 0
full-debug2-test-1:4024:4055 [0] NCCL INFO connect qid: 3, rqid: 3
full-debug2-test-1:4024:4055 [0] NCCL INFO NET/FastSocket : Connected after 1000 retries.
full-debug2-test-1:4024:4055 [0] NCCL INFO NET/FastSocket : Connected data socket 3

full-debug2-test-1:4024:4048 [0] net_fastsocket.cc:765 NCCL WARN Cannot get incoming CPU.
full-debug2-test-1:4024:4055 [0] NCCL INFO NET/FastSocket : Async connect done

full-debug2-test-0:4300:4348 [0] net_fastsocket.cc:652 NCCL WARN Cannot get incoming CPU.

full-debug2-test-1:4024:4048 [0] net_fastsocket.cc:765 NCCL WARN Cannot get incoming CPU.

full-debug2-test-0:4300:4348 [0] net_fastsocket.cc:652 NCCL WARN Cannot get incoming CPU

nvlcambier avatar Feb 14 '23 22:02 nvlcambier