ImageReward
ImageReward copied to clipboard
The formation of the reward-loss function
https://github.com/THUDM/ImageReward/blob/849f068bea38105775a72d39809de7f382340167/ImageReward/ReFL.py#L755 Referred to line 755 in ReFL.py, why the reward-loss function is formulated as relu(2-r)?
This is to map reward to loss. ReLU is a common activation function, and a value of 2 is an upper bound. ReFL gives no further gradient descent for images with a ImageReward-score of 2 or more.
why is it 2? heuristic?👀