re-introduce: stage3: efficient compute of scaled_global_grad_norm
reverting previous revert of this feature: https://github.com/nelyahu/DeepSpeed/commit/bc48371c5e1fb8fd70fc79285e66201dbb65679b in addition, bug fix for offload mode.
Hi @lekurile , can you please run ds-chat coverage on this PR? i reproduced the issue was reproted in https://github.com/nelyahu/DeepSpeed/commit/bc48371c5e1fb8fd70fc79285e66201dbb65679b and fixed it. would like to get a pre-commit validation on this test suit. CC: @tjruwase
Hi @lekurile , can you please run ds-chat coverage on this PR? i reproduced the issue was reproted in nelyahu@bc48371 and fixed it. would like to get a pre-commit validation on this test suit. CC: @tjruwase
Hi @nelyahu, thank you for the PR. I've kicked off a test run here: https://github.com/microsoft/DeepSpeed/actions/runs/8927396023
Hi @lekurile , can you please run ds-chat coverage on this PR? i reproduced the issue was reproted in nelyahu@bc48371 and fixed it. would like to get a pre-commit validation on this test suit. CC: @tjruwase
Hi @nelyahu, thank you for the PR. I've kicked off a test run here: https://github.com/microsoft/DeepSpeed/actions/runs/8927396023
Thanks @lekurile , seems like it passed, can you confirm?
Hi @lekurile , can you please run ds-chat coverage on this PR? i reproduced the issue was reproted in nelyahu@bc48371 and fixed it. would like to get a pre-commit validation on this test suit. CC: @tjruwase
Hi @nelyahu, thank you for the PR. I've kicked off a test run here: https://github.com/microsoft/DeepSpeed/actions/runs/8927396023
Thanks @lekurile , seems like it passed, can you confirm?
Yep, looks like it passed, approved the PR and running all checks.