[BUG] `finish_embedding_wgrad_compute` appears after grad all-reduce

Open QPHutu opened this issue 1 year ago • 1 comments

Describe the bug

In megatron/core/pipeline_parallel/schedules.py, finish_embedding_wgrad_compute should appear before enable_grad_sync and grad_sync_func?

Expected behavior Gradient all-reduce should happen after gradient computations.

Aug 16 '24 06:08 QPHutu

@sanandaraj5597 Can you share some comments? Thank you!

Aug 21 '24 18:08 elliottnv

Marking as stale. No activity in 60 days.

Oct 20 '24 18:10 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Aug 01 '25 02:08 github-actions[bot]