Ziqian Wang
Results
2
comments of
Ziqian Wang
same problem. It seems that using single node single trainer is fine, but when **nproc_per_node** > 1, l got the same error.
> ` world_size = int(os.environ["WORLD_SIZE"]) ` ` mp.spawn(main_worker, args=(world_size, args), nprocs=world_size) ` This is my main function to start distributed training, and when calling "spawn", it will pass an index...