The formation of the reward-loss function

Open he-nantian opened this issue 1 year ago • 1 comments

https://github.com/THUDM/ImageReward/blob/849f068bea38105775a72d39809de7f382340167/ImageReward/ReFL.py#L755 Referred to line 755 in ReFL.py, why the reward-loss function is formulated as relu(2-r)?

Jun 10 '24 20:06 he-nantian

This is to map reward to loss. ReLU is a common activation function, and a value of 2 is an upper bound. ReFL gives no further gradient descent for images with a ImageReward-score of 2 or more.

Aug 01 '24 07:08 xujz18

why is it 2? heuristic?👀

Sep 02 '24 13:09 hctian713