XiaoLong
Results
1
issues of
XiaoLong
I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your...