gulucaptain

Results 5 comments of gulucaptain

I also met this problem when using blip2_qformer model. I change the parameter 'vit_precision' with 'fp32'. This makes my code work. I hope it can help you.

please downgrade your torch version

You can try to use deepspeed zero2 instead of zero3. This may solve this problem.

Setting mixed_precision=fp16 will lead to the loss equals nan, do you also meet this problem?

Maybe you can try to reduce the learning rate or increase the training data and batch size.