DeepSpeed
DeepSpeed copied to clipboard
[Question] why are overlap and contiguous grads meaningless in stage 1 and are ignored
https://github.com/microsoft/DeepSpeed/blob/80f94c10c552ec79473775adb8902b210656ed76/deepspeed/runtime/engine.py#L1384
I wonder why we cannot use overlap_comm in zero1 to reduce more latency? Appreciate any reply.