RLHF-Reward-Modeling issues

Update eval_bench_mark.py

2

Use len(names) instead of 13 allows to run part of the evaluation benchmark each time, for machine does not have that much g-ram, this could be helpful.

ZizhengYang

Update eval_bench_mark.py allow use bf16 or f32

Some environment does support bfloat16 that good, so adding a new argument, works similarly to bf16 parameter in gemma_rm.py

ZizhengYang

Code to reproduce ArmoRM

1

Hi, this is great work and I'd like to know if there is a plan to release the training code to reproduce the model?

halfrot

Can I inquire about some training details about armo-rm？

6

I'm currently working on reproducing the training of NVIDIA's multi-objective architecture reward model. What are some questions about the training details of ARMO-RM? I'm using Mean Squared Error (MSE) as...

xiaotian917

Regarding the Gemma2 Reward Model Structure

2

I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your...

Loong435

"Token pattern not found in the list" error

6

Hi there, I got this "Token pattern not found in the list" error when I tried out the model under no_grad() condition. Would you take a look at this please?...

nshen7

How to finetune ARMO model with custom dataset?

3

How can I Fine-tuning the ARMO model with a custom dataset that only contains paired preference data without multi-objective reward scores？: )

Helen-Cheung

Training and evaluating for pair_pm model.

5

Hi, I have replicated the training and evaluation for the pair_rm model, but I haven't achieved the results reported in Table 2 of the paper. The best results I obtained...

t-sifanwu

Bradley-Terry model removes lm head while saving

1

Hello and thanks for your work! While running bradley-terry-rm/llama3_rm.py the final saved model does not have a lm head as the script is using a AutoModelForSequenceClassification model and not CausalLM....

Arnav0400

question of chat templates

6

nice work! starred already. sorry for asking, why replacing the bos_token with empty string? sample['positive'] = tokenizer.apply_chat_template( sample['chosen'], tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "") sample['negative'] = tokenizer.apply_chat_template( sample['rejected'], tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "")

trueRosun

RLHF-Reward-Modeling
RLHF-Reward-Modeling copied to clipboard

Metadata

Update eval_bench_mark.py

Update eval_bench_mark.py allow use bf16 or f32

Code to reproduce ArmoRM

Can I inquire about some training details about armo-rm？

Regarding the Gemma2 Reward Model Structure

"Token pattern not found in the list" error

How to finetune ARMO model with custom dataset?

Training and evaluating for pair_pm model.

Bradley-Terry model removes lm head while saving

question of chat templates

← Metadata

Owner

Metadata

RLHF-Reward-Modeling RLHF-Reward-Modeling copied to clipboard

Metadata

← Metadata

Owner

Metadata

RLHF-Reward-Modeling
RLHF-Reward-Modeling copied to clipboard