trlx Implement BoN for training and eval

Jul 18 '23 12:07 Dahoas

Looks slick!

(One thing I've noticed that while score increases much faster on the training set with increase of num_return_sequences this doesn't necessarly yield better score on the test set. Do you have perhaps an example or parameter setting where it does so?)

Good point, the benefit of BoN trainings seems to be problem dependent. I've seen the most benefit during training on problems where the model has a low pass@1 score.

Aug 21 '23 11:08 Dahoas

@maxreciprocate If you're happy with this do you want to merge today?

Aug 28 '23 10:08 Dahoas

@Dahoas There are some run differences when using the default config without BoN sampling, most notably for the randomwalks case: https://wandb.ai/sorry/trlx-references/reports/BoN-v-main--Vmlldzo1MjkwMzA5 Probably some minor implementation detail, have to recheck

Sep 01 '23 15:09 maxreciprocate

@Dahoas There are some run differences when using the default config without BoN sampling, most notably for the randomwalks case: https://wandb.ai/sorry/trlx-references/reports/BoN-v-main--Vmlldzo1MjkwMzA5 Probably some minor implementation detail, have to recheck

Let me look into why.

Sep 04 '23 16:09 Dahoas

@Dahoas Not sure if that's the issue however, see: https://wandb.ai/sorry/trlx/reports/Difference-due-to-the-change-in-base_trainer-decode--Vmlldzo1MzE2OTg4 (+ some non-determinism)

Sep 05 '23 15:09 maxreciprocate