RLHF training code

Open jaideep11061982 opened this issue 2 years ago • 3 comments

@mcmonkey4eva @twmmason @masaishi @lxe Do you plan to provide some assistance on how to use RLHF to fine tune Vicuna models. Its fairly a new topic introduced in pubic domain , would be great if you can give some code around it to fit it into regular LLM models

Apr 29 '23 16:04 jaideep11061982

I can't speak for what Stability AI did, but Huggingface has a blog / tutorial on how to do RLHF using their trl library. Tutorial Link

Apr 29 '23 16:04 M-Chimiste

@M-Chimiste THank you. Atleast could you point me to the rlhf datasets which may be publically available or some sample of it.

Apr 30 '23 04:04 jaideep11061982

Per their model card you can see the different datasets: model card here.

You should be able to look for each of these on the huggingface datasets repositories. Here is an example for the open assistant dataset. You'll just need to search around a bit but they should be all there.

Apr 30 '23 11:04 M-Chimiste