Matej Sirovatka comments

Results 26 comments of


                                            Matej Sirovatka

WIP: Compose TP + FSDP2

Managed to make a minimal repro of composable tp+fsdp2 working, though it requires nightly torch 🚀

@kmehant we have internally decided to have as much logic as possible in transformers, so this is postponed until that is ~resolved. You can watch that [here](https://github.com/huggingface/transformers/pull/37877)

WIP: Compose TP + FSDP2

This is maybe gonna be a mess, I started to play and had success. At [2a13375](https://github.com/huggingface/accelerate/pull/3498/commits/2a13375c577c309fa1ca0f4f37bc2e76033e5261) we have a working fsdp2+tp example, gonna try to clean this up a bit...

WIP: Compose TP + FSDP2

Also superseeded by #3682

[FP8 options] Float8Linear vs TransformerEngine

In [accelerate](https://github.com/huggingface/accelerate) we have integration with both AO and TE, where AO should soon work with FSDP2. Is there anyone tackling the integration of TE? I would be limited to...

DeepSpeed sequence parallelism (aka Ulysses) integration with HF transformer

@stas00 I think given my limited availability recently and the time I'll be able to get to doing it in DeepSpeed, you can probably just integrate it with DeepSpeed as...

MoE kernel

Will do a research on this more, if anyone has any insights on what could/should be implemented, resp. details on to how, cc me.

MoE kernel

Maybe a preliminary would be to support for example mixtral/nllb_moe from huggingface, to have the integration ready when the layers are done?

MoE kernel

@yundai424 Haven't seen one either, gonna try patching either Mixtral or Nllb with our kernels and profile it, will decide on what to do after that I guess. Implementing dMoE...

MoE kernel

@pramodith I totally agree with starting with the MLP, however i'm currently surprisingly swamped with school so I won't have time to collaborate on this. So feel free to take...