disco icon indicating copy to clipboard operation
disco copied to clipboard

Decentralized training on MNIST fails with error

Open JulienVig opened this issue 1 year ago • 0 comments

Starting a collaborative training on the decentralized MNIST task fails during the first round. There are a few things not working well:

  • It takes a long time before anything happens after hitting 'Train collaboratively' (while training starts immediately when training in local)
  • When training with two local clients, the number of participants shown is 16
  • When we get to the first communication round, the training fails with a timeout error thrown by the waitForPeers function.

Potentially linked to #667

JulienVig avatar Jul 03 '24 16:07 JulienVig