Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[BUG] `finish_embedding_wgrad_compute` appears after grad all-reduce

Open QPHutu opened this issue 1 year ago • 1 comments

Describe the bug

In megatron/core/pipeline_parallel/schedules.py, finish_embedding_wgrad_compute should appear before enable_grad_sync and grad_sync_func? image

Expected behavior Gradient all-reduce should happen after gradient computations.

QPHutu avatar Aug 16 '24 06:08 QPHutu

@sanandaraj5597 Can you share some comments? Thank you!

elliottnv avatar Aug 21 '24 18:08 elliottnv

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Oct 20 '24 18:10 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Aug 01 '25 02:08 github-actions[bot]