Megatron-LM

Megatron-LM copied to clipboard

Published 5 months ago •

Reame
Issues

Add support for MegaBlocks MoEs

Open tgale96 opened this issue 3 years ago • 11 comments

These changes add support for using MegaBlocks dMoE and MoE layers in Megatron. MegaBlocks is exposed through an adapter which isolates the megablocks package dependency so that it does not need to be installed if users are not training MoEs.

Changes Description:

Add wrappers for MegaBlocks layers in megatron/model/transformer.py
Add load balancing loss support in pretrain_gpt.py
Add MoE arguments in megatron/arguments.py
Document MoE support in README.md

Note that this pull request does not include the changes to Megatron to support expert model parallelism, pipeline parallelism and tensor model parallelism for MoEs.

Feb 22 '23 02:02 tgale96

LGTM. @jaredcasper can you please take a final look?

Apr 27 '23 16:04 kvareddy

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.

Jul 10 '23 18:07 github-actions[bot]

Commenting so that this doesn't automatically get closed :)

Jul 10 '23 18:07 tgale96

Marking as stale. No activity in 60 days.

Sep 09 '23 18:09 github-actions[bot]

commenting

Sep 15 '23 04:09 jonhilgart22

Marking as stale. No activity in 60 days.

Nov 15 '23 18:11 github-actions[bot]

commenting

Nov 15 '23 21:11 xrsrke

Marking as stale. No activity in 60 days.

Jan 15 '24 18:01 github-actions[bot]

commenting

Jun 17 '24 07:06 tylaar

Marking as stale. No activity in 60 days.

Aug 16 '24 18:08 github-actions[bot]

if dmoe is merged, team megatron will win the nobel prize i guess

Oct 11 '24 07:10 SeunghyunSEO

Marking as stale. No activity in 60 days.

Dec 10 '24 18:12 github-actions[bot]