ColossalAI
ColossalAI copied to clipboard
[autoparallel] Draft for mix gather
What's new? Add a one-step transformation called mix-gather for:
| Src | Dst |
|---|---|
| S0S1 | RR |
| S1S0 | RR |
| S01R | RR |
| RS01 | RR |
Why do we need this? Reduce the communication cost. Assume $\beta_1 \gt \beta_0$, $M$ is the communication size. Cost for S0S1=>S0R=>RR is $\frac{M}{n_1n_0}\times\frac{n_1-1}{n_1}\times\beta_1 + \frac{M}{n_1}\times\frac{n_0-1}{n_0}\times\beta_0$ Cost for S0S1=>RR is $\frac{M}{n_0n_1}\times\frac{n_0n_1-1}{n_0n_1}\times\beta_1$
Pitfalls Peak memory increases by the size of a tensor for S0S1=>RR and S1S0=>RR
Could you please push the unit test file for this feature?