direct-preference-optimization
direct-preference-optimization copied to clipboard
Using Mistral 7B with transformers v4.38.1 on MATH dataset, and facing memory leaks
In both the Trainers, Basic, and FSDP, there is an underlying pattern of GPU memory not being freed. Allocation keeps increasing in steps while utilization remains roughly constant.
Does anyone have any suggestions of what might have gone wrong?