DeepSpeed
DeepSpeed copied to clipboard
BF16 optimizer: Improve device utilization via instant grad update
Enabled gradient accumulation in bf16 optimizer which updates fp32 gradients once the gradient is available.
This improves device utilization on some back-ends, by parallelizing the underlying workload across hardware engines.
To enable the feature (disabled by default), use a new config flag "accumulate_grads_via_hooks" under "bf16" section in Deepspeed config.json (default is false). Example: "bf16": { "enabled": true, "accumulate_grads_via_hooks": true }