ImageReward icon indicating copy to clipboard operation
ImageReward copied to clipboard

ReFL Training Performance

Open LemonTwoL opened this issue 1 year ago • 2 comments

The performance of my ReFL model after training on refl_data.json(https://github.com/THUDM/ImageReward/blob/main/data/refl_data.json) is significantly worse than the untrained SD1.4 model. The results are far from satisfactory, and I'm not sure what might be causing this issue.

Training Settings: GPUs: 2 * A100 GPU --train_batch_size: 8 --gradient_accumulation_steps: 4 --num_train_epochs: 100 --learning_rate: 1e-5

Given: seed: 100 prompt: a coffee mug made of cardboard Result of untrained sd1.4: image Result of trained ReFL: image

Could you please explain this phenomenon?

LemonTwoL avatar Sep 01 '24 01:09 LemonTwoL

image

In practice, to avoid rapid overfitting and stabilize the fine-tuning, we re-weight ReFL loss and regularize with pre-training loss.

Our code is only used to display the ReFL loss, you will need to add the pre-training loss according to your own settings.

xujz18 avatar Sep 01 '24 02:09 xujz18

image > In practice, to avoid rapid overfitting and stabilize the fine-tuning, we re-weight ReFL loss and regularize with pre-training loss.

Our code is only used to display the ReFL loss, you will need to add the pre-training loss according to your own settings. Thx for your reply.

In this answer (https://github.com/THUDM/ImageReward/issues/24#issuecomment-1597110335), you mentioned that 'it is simpler to use ReFL alone directly and to achieve decent results.' According to your statement, using only the ReFL loss should yield reasonably good results, but I am unable to achieve that. It seems that the loss has already converged. image

Additionally, the paper mentions: 'the pre-training dataset is from a 625k subset of LAION-5B [50] selected by aesthetic score.' I wonder if you plan to release this part of the dataset.

Thanks again.

LemonTwoL avatar Sep 01 '24 06:09 LemonTwoL