DeepSpeed
DeepSpeed copied to clipboard
[BUG]Deepspeed zero3 student+teacher memory leak
Hi there,
when using zero3 and zero.Init in a distillation scenario, it was observed that a memory leak can occur, with the maximum allocated memory increasing with each iteration. However, no memory leak when disabling zero.Init.
enable zero.Init
