chrisliu298
chrisliu298
When I use `dpo` as the forget loss, I encountered the following error: ```python Traceback (most recent call last): File "/root/tofu/forget.py", line 187, in main trainer.train() File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1780,...
I noticed that, for default (sequence classification) models with chat template defined in the tokenizer, `scripts/run_rm.py` formats each conversation by `tokenizer.apply_chat_template` (via the function [`prepare_dialogue_from_tokenizer`](https://github.com/allenai/reward-bench/blob/bc72fb2a573fc31c614eef3405d354b398977b02/rewardbench/utils.py#L515)) and then uses the text...
I just noticed that the v2 evaluation code was merged a few months ago, and I'm simply curious if RewardBench v2 will be released soon. :)