Quan Sun

Results 7 issues of Quan Sun

Signed-off-by: quansun Add training code of the 1.1B EVA-CLIP(https://arxiv.org/abs/2211.07636)

Signed-off-by: quansun Add new features: - layer decay with value depending on layer - support for different lr for text and image

Signed-off-by: Sun Quan Add the 1.1B EVA-CLIP(https://arxiv.org/abs/2211.07636)

Signed-off-by: Sun Quan Add deepspeed zero-stage-1 to the training code

Hi there, when using zero3 and zero.Init in a distillation scenario, it was observed that a memory leak can occur, with the maximum allocated memory increasing with each iteration. However,...

bug
training