Megatron-LM
Megatron-LM copied to clipboard
[BUG] `finish_embedding_wgrad_compute` appears after grad all-reduce
Describe the bug
In megatron/core/pipeline_parallel/schedules.py,
finish_embedding_wgrad_compute should appear before enable_grad_sync and grad_sync_func?
Expected behavior Gradient all-reduce should happen after gradient computations.
@sanandaraj5597 Can you share some comments? Thank you!
Marking as stale. No activity in 60 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.