DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

make parameters status shared by all PartitionedParameterCoordinator instances

Open HeyangQin opened this issue 2 years ago • 0 comments

There could be multiple PartitionedParameterCoordinator instances, yet they currently manage the parameters in a standalone manner. Let's say we have PartitionedParameterCoordinator A and B. When A puts some parameters inflight, B is not aware of that and when B tries to use these parameters it will just error out. This PR addresses this issue by making the __InflightParamRegistry shared among all PartitionedParameterCoordinator instances.

This PR fixes https://github.com/microsoft/DeepSpeed/issues/3068 and hopefully would fix https://github.com/microsoft/DeepSpeed/issues/3156 as well 😉

HeyangQin avatar Apr 25 '23 23:04 HeyangQin