LMOps icon indicating copy to clipboard operation
LMOps copied to clipboard

[Minillm] Using the qwen2-72b model as the teacher model for minillm training results in out of memory

Open shhn1 opened this issue 1 year ago • 1 comments

I use qwen2-72b as the teacher model and qwen2.5 32b model as the student model for training. 8*80g A100 are used for training.

When I load the qwen2 72b model, I find that the teacher model is not split on each gpu, and the complete qwen2 72b model is loaded on each gpu, resulting in oom.

When I test the model loading alone, the qwen2 72b can be split and loaded on multiple gpus. I don't understand why this happens now.

Have you tried two larger models for minillm experiments? I see that the largest teacher model in the paper is only 13B.

shhn1 avatar Nov 12 '24 09:11 shhn1

You can try model (tensor) parallel for large teacher and student model. First, you need to change the model parallel size as decribed here. Then, you can follow this script to run qwen models

t1101675 avatar Nov 23 '24 23:11 t1101675