Qi Penghui

Results 3 issues of Qi Penghui

**Describe the bug** In `megatron/core/pipeline_parallel/schedules.py`, `finish_embedding_wgrad_compute` should appear before `enable_grad_sync` and `grad_sync_func`? **Expected behavior** Gradient all-reduce should happen after gradient computations.

Is there any plan to fix this optimization bias in [the ppo loss](https://github.com/OpenRLHF/OpenRLHF/blob/cad8193a453a29de1154fdcaa62bf6c1cecc83e0/openrlhf/models/loss.py#L76) ? Paper Link: https://github.com/sail-sg/understand-r1-zero/blob/main/understand-r1-zero.pdf