About the reward score

Open KN1GHT9 opened this issue 1 year ago • 1 comments

In your paper the labeling method seems to be labeled by star ratings, so how is this converted to specific floating point scores when training RM?

Mar 04 '24 14:03 KN1GHT9

You can refer to Section 2.2 RM Training for details.

Mar 05 '24 02:03 xujz18