FedLab icon indicating copy to clipboard operation
FedLab copied to clipboard

[Feature Proposal] (New data partition strategy) Extended Dirichlet strategy

Open liyipeng00 opened this issue 2 years ago • 4 comments

Recently, I find one new data partition strategy called Extended Dirichlet strategy ~~~ ours :), which could be added in this repo.

It combines the two common partition strategies (i.e., Quantity-based class imbalance and Diribution-based class imbalance in Li et al. (2022)) to generate arbitrarily heterogeneous data. The difference is to add a step of allocating classes (labels) to determine the number of classes per client (denoted by $C$) before allocating samples via Dirichlet distribution (with concentrate parameter $\alpha$).

The implementation is in convergence. You can find more details in Convergence Analysis of Sequential Federated Learning on Heterogeneous Data. [Figure: Row 1: $C=2$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$; Row 2: $C=5$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$; Row 3: $C=10$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$; ]

Li, Q., Diao, Y., Chen, Q., & He, B. (2022, May). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (pp. 965-978). IEEE.

liyipeng00 avatar Nov 03 '23 03:11 liyipeng00

We will check your code. Thank you very much!

AgentDS avatar Nov 03 '23 17:11 AgentDS

Thanks. We are glad to hear from you. The code is ExDirPartition, and you can generate the map with the following command (changing the dataset location is required).

python partition.py -d mnist -n 10 --partition exdir -C 1 --alpha 1.0 

liyipeng00 avatar Nov 03 '23 23:11 liyipeng00

Interesting work!

AgentDS avatar Nov 04 '23 19:11 AgentDS

Thanks, =^_^=.

liyipeng00 avatar Nov 05 '23 02:11 liyipeng00