ImageReward icon indicating copy to clipboard operation
ImageReward copied to clipboard

About the reward score

Open KN1GHT9 opened this issue 1 year ago • 1 comments

In your paper the labeling method seems to be labeled by star ratings, so how is this converted to specific floating point scores when training RM?

KN1GHT9 avatar Mar 04 '24 14:03 KN1GHT9

image You can refer to Section 2.2 RM Training for details.

xujz18 avatar Mar 05 '24 02:03 xujz18