xiaotian917

Results 1 issues of xiaotian917

I'm currently working on reproducing the training of NVIDIA's multi-objective architecture reward model. What are some questions about the training details of ARMO-RM? I'm using Mean Squared Error (MSE) as...