[BUG] How to manipulate the gradient during training

Open stonewst opened this issue 1 year ago • 0 comments

Describe the bug In some scenarios, it is necessary to customize the gradient of the parameters, such as modifying the gradient of some parameters. But in the deepspeed mode (especially z3), how to achieve this?

I notice that safe_set_full_fp32_param and safe_set_full_optimizer_state are supported. Could you support safe_set_full_grad as well?

Expected behavior The gradient can be modified by any customed scheme during each training iteration.

Aug 21 '24 06:08 stonewst