Using Mistral 7B with transformers v4.38.1 on MATH dataset, and facing memory leaks

Open Jayant1234 opened this issue 1 year ago • 0 comments

In both the Trainers, Basic, and FSDP, there is an underlying pattern of GPU memory not being freed. Allocation keeps increasing in steps while utilization remains roughly constant.

Does anyone have any suggestions of what might have gone wrong?

May 04 '24 22:05 Jayant1234