Add a artificial metric of training time

Open arthurPignet opened this issue 4 years ago • 0 comments

With the current implementations of the various mpls, the computational time is not at all related to a true federated scenario, for two main reasons:

the partners cannot train in parallel.
there is no communication between the "central server" or the partners.

I think it's in fact possible to compute a approximation of the federated training time of a scenario, by taking into account the parallelism of the local training and the communication time cost

the time of a global batch would be the max of the actual time for each sequential local training for this global batch)

$t_{global batch i} = max_p \sum_{local batch j in global batch i for partner p} t_{local batch j}$

The communication time would be an arbitrary "communication time" multiplied by the number of communication, which is algorithm dependent. For fedavg, it would be equal to the number of globalbatch/minibatch for the training, + the ones needed for the initialization/test (which can maybe be neglected ?)

With the model the amount of data exchanged is not really taking into account, I don't know if it's mandatory

What do you think about this idea/model ?

Apr 20 '21 14:04 arthurPignet