ImageReward ReFL Training Performance

The performance of my ReFL model after training on refl_data.json(https://github.com/THUDM/ImageReward/blob/main/data/refl_data.json) is significantly worse than the untrained SD1.4 model. The results are far from satisfactory, and I'm not sure what might be causing this issue.

Training Settings: GPUs: 2 * A100 GPU --train_batch_size: 8 --gradient_accumulation_steps: 4 --num_train_epochs: 100 --learning_rate: 1e-5

Given: seed: 100 prompt: a coffee mug made of cardboard Result of untrained sd1.4: Result of trained ReFL:

Could you please explain this phenomenon?

Sep 01 '24 01:09 LemonTwoL

In practice, to avoid rapid overfitting and stabilize the fine-tuning, we re-weight ReFL loss and regularize with pre-training loss.

Our code is only used to display the ReFL loss, you will need to add the pre-training loss according to your own settings.

Sep 01 '24 02:09 xujz18

> In practice, to avoid rapid overfitting and stabilize the fine-tuning, we re-weight ReFL loss and regularize with pre-training loss.
Our code is only used to display the ReFL loss, you will need to add the pre-training loss according to your own settings. Thx for your reply.

In this answer (https://github.com/THUDM/ImageReward/issues/24#issuecomment-1597110335), you mentioned that 'it is simpler to use ReFL alone directly and to achieve decent results.' According to your statement, using only the ReFL loss should yield reasonably good results, but I am unable to achieve that. It seems that the loss has already converged.

Additionally, the paper mentions: 'the pre-training dataset is from a 625k subset of LAION-5B [50] selected by aesthetic score.' I wonder if you plan to release this part of the dataset.

Thanks again.

Sep 01 '24 06:09 LemonTwoL