Kite0011

Results 3 comments of Kite0011

> > > @szhengac You are correct, LAMB and LARS implementations that are not aware of ZeRO will not work correctly with ZeRO. This is not a fundamental limitation of...

> > > > @szhengac You are correct, LAMB and LARS implementations that are not aware of ZeRO will not work correctly with ZeRO. This is not a fundamental limitation...

Hi @lucidrains! Would you mean i can just imply GAU on cross-attention model such as t5? I foud GAU works very well on bert model