PFLlib icon indicating copy to clipboard operation
PFLlib copied to clipboard

Questions about the test_metrics() functions and the partition between train and test.

Open wizard1203 opened this issue 3 years ago • 1 comments

Thanks for your awesome libs, including so many algorithms.

I have a question about the evaluate function in the serverbase.py: https://github.com/TsingZ0/PFL-Non-IID/blob/fd23a2124265fac69c137b313e66e45863487bd5/system/flcore/servers/serverbase.py#L165

It seems that in each round, the test will be conducted only on the selected clients. Is this a standard setting in PFL? Shouldn't we conduct evaluation on all test datasets? Testing only on the selected clients may introduce some bias for testing.

And I note that you merge the original train dataset and the test dataset together, and then partition them to clients. Finally you split the sub-dataset into train and test dataset for each client. My question is, will this lead to that some test data of clients is in the train dataset, instead of original test dataset? https://github.com/TsingZ0/PFL-Non-IID/blob/fd23a2124265fac69c137b313e66e45863487bd5/dataset/generate_cifar10.py#L56

wizard1203 avatar Sep 12 '22 09:09 wizard1203

For question one (evaluate function): This is a platform for research, not for production currently. You can modify any of the code as you want. If testing on all the clients is better than testing on the selected clients, you can replace self. selected_clients by self.clients :-). In my opinion, testing on all the clients is more reasonable. I will update this code. Thanks!

For question two (test data): The goal of this rearrangement is to generate different ratios of train/test datasets for different research goals. For the datasets (mnist, cifar10, imagenet, etc.) and the classification tasks considered in this platform, the original test data and train data are independent and identically distributed (IID), so we can mix them together and then split them. If the original test data has its specific features compared to the train data, such as the sequential data in the recommender system, it is not reasonable to merge the original train data and the test data together :-). This platform currently does not consider these tasks. You are welcome to add them through PR!

TsingZ0 avatar Sep 13 '22 09:09 TsingZ0