Max Kovalenko issues

Results 6 issues of


                                            Max Kovalenko

Partition parameters: Minor refactoring of use_secondary_tensor condition

Introduce use_secondary_tensor bool variable to shorten notation and improve readability.

BF16 optimizer: Improve device utilization via instant grad update

Enabled gradient accumulation in bf16 optimizer which updates fp32 gradients once the gradient is available. This improves device utilization on some back-ends, by parallelizing the underlying workload across hardware engines....

Add throughput timer configuration

The new "timers" section describes configuration for different timers. Specifically, in the "throughput" section, it is possible to disable the throughput timer (enabled by default). This allows to avoid the...

Optimize zero3 fetch params using all_reduce

* Use all_reduce instead of all_gather to fetch module parameters. This improves performance by reducing the overhead of concatenation and slicing, which are no longer required. * Instead, all tensors...

[BUG] Zero3: Post backward hook is not triggered for submodules whose inputs have .required_grad=False

**Describe the bug** The mechanism of pre-backward and post-backward hooks employs adding a custom autograd function class on tensors, which are either inputs to the module (for [post-backward](https://github.com/microsoft/DeepSpeed/blob/3dd7ccff8103be60c31d963dd2278d43abb68fd1/deepspeed/runtime/zero/parameter_offload.py#L387)) or outputs...

bug

training

Enabled compiled autograd for backward pass

Compiled Autograd is an extension to torch.compile which enhances the autograd engine by capturing a larger backward computation graph at runtime. This allows a more comprehensive optimization of the backward...