Shaoshu Yang
Shaoshu Yang
Hi taoisu. I have exactly the same issue. Have you solved it yet? Can you share your solutions please?
Hi, I try to work out a usable EMA module with Zero Stage 3. See below: ```python from deepspeed.runtime.zero import GatheredParameters class DSEma(nn.Module): def __init__(self, model, decay=0.9999, use_num_updates=True): super().__init__() if...
@czczup Hi cz. It does run EMA for me in zero stage 3. But I didn't check whether it behaves exactly the same as LitEMA.