XiaoLong

Results 1 issues of XiaoLong

I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your...