RLHF-Reward-Modeling icon indicating copy to clipboard operation
RLHF-Reward-Modeling copied to clipboard

How do you implement SLic on pair_pm model?

Open t-sifanwu opened this issue 1 year ago • 1 comments

Hi, thanks for uploading the code for pair_pm! Since in the blog, it seems that you are using SLiC for pair_pm models. In the directory of pair_pm, I can't find the code for using slic methods.

t-sifanwu avatar Jun 27 '24 23:06 t-sifanwu

Hi, thanks for your interest in our project!

We mention Slic paper because the pair-wise model training was first proposed in this paper. We do not do RLHF in this project. If you are interested in the subsequent RLHF stage, you may check this project https://github.com/RLHFlow/Online-RLHF

WeiXiongUST avatar Jun 28 '24 01:06 WeiXiongUST