RLHF training code
@mcmonkey4eva @twmmason @masaishi @lxe Do you plan to provide some assistance on how to use RLHF to fine tune Vicuna models. Its fairly a new topic introduced in pubic domain , would be great if you can give some code around it to fit it into regular LLM models
I can't speak for what Stability AI did, but Huggingface has a blog / tutorial on how to do RLHF using their trl library. Tutorial Link
@M-Chimiste THank you. Atleast could you point me to the rlhf datasets which may be publically available or some sample of it.
Per their model card you can see the different datasets: model card here.
You should be able to look for each of these on the huggingface datasets repositories. Here is an example for the open assistant dataset. You'll just need to search around a bit but they should be all there.