benchmarks
benchmarks copied to clipboard
ncclCommInitRank error for 2 GPUs from a single machine
I'm trying to run tf_cnn_benchmark.py on Power9 machine.
When i tried to run the benchmark with horovod using 1 GPU, it worked fine;
When I tried to use 2 GPUs from a single node, I got the error of "ncclCommInitRank failed: unhandled cuda error".
Then I tried to run the same benchmark with 2 GPUs each from a node and it worked fine.
So how do I leverage the multiple GPUs from a single node with horovod or I have to use other distributed learning api?