Shibo Hao
Shibo Hao
My result of bert-base is similar to yours.
Have you solved the problem?
> I had similar generations for multi-GPU runs. Setting random seeds made them coherent for me. Thank you so much! That solved the bug for me!
Hi, thanks for sharing your experiment results! We are also running SFT with sp and gradient accumulation, and the trained model shows some strange behaviors. Have you figured out the...