RLHF-Reward-Modeling question of chat templates

nice work! starred already. sorry for asking, why replacing the bos_token with empty string?

sample['positive'] = tokenizer.apply_chat_template(
        sample['chosen'], tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "")
sample['negative'] = tokenizer.apply_chat_template(
    sample['rejected'], tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "")

Jun 13 '24 13:06 trueRosun

Because when we service the Bradley Terry RM with pipeline, it will automatically add a bos_token inside the pipeline when tokenizing.

For pair-wise preference model, it is because we train the model without a bos_token (this is indeed some issue of llama3 at that time). But the influence of the bos token is mild in general.

Jun 13 '24 13:06 WeiXiongUST

thank you for answering!

I will further check the outputs after tokenization.

Jun 13 '24 13:06 trueRosun

Because when we service the Bradley Terry RM with pipeline, it will automatically add a bos_token inside the pipeline when tokenizing.

I don't fully understand...if the inference-time pipeline adds the bos_token automatically, doesn't that mean we should train with the bos token?

Jun 28 '24 02:06 hunterlang

Because when we service the Bradley Terry RM with pipeline, it will automatically add a bos_token inside the pipeline when tokenizing.

I don't fully understand...if the inference-time pipeline adds the bos_token automatically, doesn't that mean we should train with the bos token?

Yes, you are correct. Unfortunately, when we train the model, there is a bug in the llama3 tokenizer so the model is trained WITHOUT bos token.

We have tested with/without bos, it can lead to ~1% difference in the reward bench accuracy. You may modify the tokenizer to prevent the tokenizer from adding a bos token automatically to fix the issue I guess...

Jun 28 '24 02:06 WeiXiongUST

Thanks for the reply! Just to clarify:

If I remove those .replace(tokenizer.bos_token, "") calls, then training should match inference, because the inference pipeline adds BOS automatically.

If I modify the tokenizer, then the inference pipeline will match the off-the-shelf models you already released, which were trained without BOS?

Jun 28 '24 05:06 hunterlang

We get a bos token when we tokenize by apply chat template. Then, inside the pipeline, we get another bos token.

If you remove .replace(tokenizer.bos_token, ""), you still get one bos token inside the pipeline. If you do not remove .replace(tokenizer.bos_token, ""), you will get two bos tokens.

If we modify the tokenizer to avoid it to add bos token, then we will never get bos token. Then, it matches the training (no bos token).

Jun 28 '24 06:06 WeiXiongUST