Implement BoN for training and eval
Looks slick!
(One thing I've noticed that while score increases much faster on the training set with increase of
num_return_sequencesthis doesn't necessarly yield better score on the test set. Do you have perhaps an example or parameter setting where it does so?)
![]()
Good point, the benefit of BoN trainings seems to be problem dependent. I've seen the most benefit during training on problems where the model has a low pass@1 score.
@maxreciprocate If you're happy with this do you want to merge today?
@Dahoas There are some run differences when using the default config without BoN sampling, most notably for the randomwalks case: https://wandb.ai/sorry/trlx-references/reports/BoN-v-main--Vmlldzo1MjkwMzA5 Probably some minor implementation detail, have to recheck
@Dahoas There are some run differences when using the default config without BoN sampling, most notably for the randomwalks case: https://wandb.ai/sorry/trlx-references/reports/BoN-v-main--Vmlldzo1MjkwMzA5 Probably some minor implementation detail, have to recheck
Let me look into why.
@Dahoas Not sure if that's the issue however, see: https://wandb.ai/sorry/trlx/reports/Difference-due-to-the-change-in-base_trainer-decode--Vmlldzo1MzE2OTg4 (+ some non-determinism)
