disco
disco copied to clipboard
Decentralized training on MNIST fails with error
Starting a collaborative training on the decentralized MNIST task fails during the first round. There are a few things not working well:
- It takes a long time before anything happens after hitting 'Train collaboratively' (while training starts immediately when training in local)
- When training with two local clients, the number of participants shown is 16
- When we get to the first communication round, the training fails with a timeout error thrown by the
waitForPeersfunction.
Potentially linked to #667