Add support for MegaBlocks MoEs
These changes add support for using MegaBlocks dMoE and MoE layers in Megatron. MegaBlocks is exposed through an adapter which isolates the megablocks package dependency so that it does not need to be installed if users are not training MoEs.
Changes Description:
- Add wrappers for MegaBlocks layers in megatron/model/transformer.py
- Add load balancing loss support in pretrain_gpt.py
- Add MoE arguments in megatron/arguments.py
- Document MoE support in README.md
Note that this pull request does not include the changes to Megatron to support expert model parallelism, pipeline parallelism and tensor model parallelism for MoEs.
LGTM. @jaredcasper can you please take a final look?
Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.
Commenting so that this doesn't automatically get closed :)
Marking as stale. No activity in 60 days.
commenting
Marking as stale. No activity in 60 days.
commenting
Marking as stale. No activity in 60 days.
commenting
Marking as stale. No activity in 60 days.
if dmoe is merged, team megatron will win the nobel prize i guess
Marking as stale. No activity in 60 days.