Ziqian Wang

Results 2 comments of Ziqian Wang

same problem. It seems that using single node single trainer is fine, but when **nproc_per_node** > 1, l got the same error.

> ` world_size = int(os.environ["WORLD_SIZE"]) ` ` mp.spawn(main_worker, args=(world_size, args), nprocs=world_size) ` This is my main function to start distributed training, and when calling "spawn", it will pass an index...