StableLM icon indicating copy to clipboard operation
StableLM copied to clipboard

RLHF training code

Open jaideep11061982 opened this issue 2 years ago • 3 comments

@mcmonkey4eva @twmmason @masaishi @lxe Do you plan to provide some assistance on how to use RLHF to fine tune Vicuna models. Its fairly a new topic introduced in pubic domain , would be great if you can give some code around it to fit it into regular LLM models

jaideep11061982 avatar Apr 29 '23 16:04 jaideep11061982

I can't speak for what Stability AI did, but Huggingface has a blog / tutorial on how to do RLHF using their trl library. Tutorial Link

M-Chimiste avatar Apr 29 '23 16:04 M-Chimiste

@M-Chimiste THank you. Atleast could you point me to the rlhf datasets which may be publically available or some sample of it.

jaideep11061982 avatar Apr 30 '23 04:04 jaideep11061982

Per their model card you can see the different datasets: model card here.

You should be able to look for each of these on the huggingface datasets repositories. Here is an example for the open assistant dataset. You'll just need to search around a bit but they should be all there.

M-Chimiste avatar Apr 30 '23 11:04 M-Chimiste