DeepSpeed
DeepSpeed copied to clipboard
make parameters status shared by all PartitionedParameterCoordinator instances
There could be multiple PartitionedParameterCoordinator instances, yet they currently manage the parameters in a standalone manner. Let's say we have PartitionedParameterCoordinator A and B. When A puts some parameters inflight, B is not aware of that and when B tries to use these parameters it will just error out. This PR addresses this issue by making the __InflightParamRegistry shared among all PartitionedParameterCoordinator instances.
This PR fixes https://github.com/microsoft/DeepSpeed/issues/3068 and hopefully would fix https://github.com/microsoft/DeepSpeed/issues/3156 as well 😉