BF16 optimizer: Improve device utilization via instant grad update

Open deepcharm opened this issue 2 years ago • 0 comments

Enabled gradient accumulation in bf16 optimizer which updates fp32 gradients once the gradient is available.

This improves device utilization on some back-ends, by parallelizing the underlying workload across hardware engines.

To enable the feature (disabled by default), use a new config flag "accumulate_grads_via_hooks" under "bf16" section in Deepspeed config.json (default is false). Example: "bf16": { "enabled": true, "accumulate_grads_via_hooks": true }

Dec 18 '23 16:12 deepcharm