trlx icon indicating copy to clipboard operation
trlx copied to clipboard

Implement BoN for training and eval

Open Dahoas opened this issue 2 years ago • 5 comments

Dahoas avatar Jul 18 '23 12:07 Dahoas

Looks slick!

(One thing I've noticed that while score increases much faster on the training set with increase of num_return_sequences this doesn't necessarly yield better score on the test set. Do you have perhaps an example or parameter setting where it does so?)

Screenshot 2023-08-11 at 15 30 46 Screenshot 2023-08-11 at 15 35 19

Good point, the benefit of BoN trainings seems to be problem dependent. I've seen the most benefit during training on problems where the model has a low pass@1 score.

Dahoas avatar Aug 21 '23 11:08 Dahoas

@maxreciprocate If you're happy with this do you want to merge today?

Dahoas avatar Aug 28 '23 10:08 Dahoas

@Dahoas There are some run differences when using the default config without BoN sampling, most notably for the randomwalks case: https://wandb.ai/sorry/trlx-references/reports/BoN-v-main--Vmlldzo1MjkwMzA5 Probably some minor implementation detail, have to recheck

maxreciprocate avatar Sep 01 '23 15:09 maxreciprocate

@Dahoas There are some run differences when using the default config without BoN sampling, most notably for the randomwalks case: https://wandb.ai/sorry/trlx-references/reports/BoN-v-main--Vmlldzo1MjkwMzA5 Probably some minor implementation detail, have to recheck

Let me look into why.

Dahoas avatar Sep 04 '23 16:09 Dahoas

@Dahoas Not sure if that's the issue however, see: https://wandb.ai/sorry/trlx/reports/Difference-due-to-the-change-in-base_trainer-decode--Vmlldzo1MzE2OTg4 (+ some non-determinism)

maxreciprocate avatar Sep 05 '23 15:09 maxreciprocate