RLHF-Reward-Modeling icon indicating copy to clipboard operation
RLHF-Reward-Modeling copied to clipboard

Recipes to train reward model for RLHF.

Results 26 RLHF-Reward-Modeling issues
Sort by recently updated
recently updated
newest added

Use len(names) instead of 13 allows to run part of the evaluation benchmark each time, for machine does not have that much g-ram, this could be helpful.

Some environment does support bfloat16 that good, so adding a new argument, works similarly to bf16 parameter in gemma_rm.py

Hi, this is great work and I'd like to know if there is a plan to release the training code to reproduce the model?

I'm currently working on reproducing the training of NVIDIA's multi-objective architecture reward model. What are some questions about the training details of ARMO-RM? I'm using Mean Squared Error (MSE) as...

I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your...

Hi there, I got this "Token pattern not found in the list" error when I tried out the model under no_grad() condition. Would you take a look at this please?...

How can I Fine-tuning the ARMO model with a custom dataset that only contains paired preference data without multi-objective reward scores?: )

Hi, I have replicated the training and evaluation for the pair_rm model, but I haven't achieved the results reported in Table 2 of the paper. The best results I obtained...

Hello and thanks for your work! While running bradley-terry-rm/llama3_rm.py the final saved model does not have a lm head as the script is using a AutoModelForSequenceClassification model and not CausalLM....

nice work! starred already. sorry for asking, why replacing the bos_token with empty string? sample['positive'] = tokenizer.apply_chat_template( sample['chosen'], tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "") sample['negative'] = tokenizer.apply_chat_template( sample['rejected'], tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "")