ImageReward
ImageReward copied to clipboard
About the reward score
In your paper the labeling method seems to be labeled by star ratings, so how is this converted to specific floating point scores when training RM?