DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Fix expert grad scaling problem with ZeRO optimizer

Open wyooyw opened this issue 1 year ago • 3 comments

Fix [#6545]

work:

  • expert gradient average: divide edp_world_size -> divide dp_world_size
  • unit test: make sure model with different dp/ep has same expert gradient

wyooyw avatar Sep 17 '24 15:09 wyooyw

@microsoft-github-policy-service agree

wyooyw avatar Sep 17 '24 15:09 wyooyw

@wyooyw It seems that you should also delete or comment https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1072 when you delete https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1079

ranzhejiang avatar Sep 18 '24 01:09 ranzhejiang

@wyooyw It seems that you should also delete or comment https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1072 when you delete https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage_1_and_2.py#L1079

Thank you for your suggestion. This redundant line of code has been removed.

wyooyw avatar Sep 18 '24 02:09 wyooyw