Matej Sirovatka

Results 26 comments of Matej Sirovatka

Managed to make a minimal repro of composable tp+fsdp2 working, though it requires nightly torch 🚀

@kmehant we have internally decided to have as much logic as possible in transformers, so this is postponed until that is ~resolved. You can watch that [here](https://github.com/huggingface/transformers/pull/37877)

This is maybe gonna be a mess, I started to play and had success. At [2a13375](https://github.com/huggingface/accelerate/pull/3498/commits/2a13375c577c309fa1ca0f4f37bc2e76033e5261) we have a working fsdp2+tp example, gonna try to clean this up a bit...

Also superseeded by #3682

In [accelerate](https://github.com/huggingface/accelerate) we have integration with both AO and TE, where AO should soon work with FSDP2. Is there anyone tackling the integration of TE? I would be limited to...

@stas00 I think given my limited availability recently and the time I'll be able to get to doing it in DeepSpeed, you can probably just integrate it with DeepSpeed as...

Will do a research on this more, if anyone has any insights on what could/should be implemented, resp. details on to how, cc me.

Maybe a preliminary would be to support for example mixtral/nllb_moe from huggingface, to have the integration ready when the layers are done?

@yundai424 Haven't seen one either, gonna try patching either Mixtral or Nllb with our kernels and profile it, will decide on what to do after that I guess. Implementing dMoE...

@pramodith I totally agree with starting with the MLP, however i'm currently surprisingly swamped with school so I won't have time to collaborate on this. So feel free to take...