Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

Add support for MegaBlocks MoEs

Open tgale96 opened this issue 3 years ago • 11 comments

These changes add support for using MegaBlocks dMoE and MoE layers in Megatron. MegaBlocks is exposed through an adapter which isolates the megablocks package dependency so that it does not need to be installed if users are not training MoEs.

Changes Description:

  • Add wrappers for MegaBlocks layers in megatron/model/transformer.py
  • Add load balancing loss support in pretrain_gpt.py
  • Add MoE arguments in megatron/arguments.py
  • Document MoE support in README.md

Note that this pull request does not include the changes to Megatron to support expert model parallelism, pipeline parallelism and tensor model parallelism for MoEs.

tgale96 avatar Feb 22 '23 02:02 tgale96

LGTM. @jaredcasper can you please take a final look?

kvareddy avatar Apr 27 '23 16:04 kvareddy

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jul 10 '23 18:07 github-actions[bot]

Commenting so that this doesn't automatically get closed :)

tgale96 avatar Jul 10 '23 18:07 tgale96

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Sep 09 '23 18:09 github-actions[bot]

commenting

jonhilgart22 avatar Sep 15 '23 04:09 jonhilgart22

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Nov 15 '23 18:11 github-actions[bot]

commenting

xrsrke avatar Nov 15 '23 21:11 xrsrke

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Jan 15 '24 18:01 github-actions[bot]

commenting

tylaar avatar Jun 17 '24 07:06 tylaar

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Aug 16 '24 18:08 github-actions[bot]

if dmoe is merged, team megatron will win the nobel prize i guess

SeunghyunSEO avatar Oct 11 '24 07:10 SeunghyunSEO

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Dec 10 '24 18:12 github-actions[bot]