xiaotian917

Results 2 comments of xiaotian917

@Haoxiang-Wang Hi, i hope you don't mind me seeking your advice. I’m currently working on training a multi-objective reward model using the qwen-1.8B-sft model, and I'm utilizing two datasets: helpSteer...