Shaoshu Yang

Results 3 comments of Shaoshu Yang

Hi taoisu. I have exactly the same issue. Have you solved it yet? Can you share your solutions please?

Hi, I try to work out a usable EMA module with Zero Stage 3. See below: ```python from deepspeed.runtime.zero import GatheredParameters class DSEma(nn.Module): def __init__(self, model, decay=0.9999, use_num_updates=True): super().__init__() if...

@czczup Hi cz. It does run EMA for me in zero stage 3. But I didn't check whether it behaves exactly the same as LitEMA.