Louis Castricato
Louis Castricato
### 🚀 The feature, motivation, and pitch We need the ability to use massive reward models, as this will be necessary for our Instruct GPT model. Currently the size of...
If the reward model cannot fit on a single GPU, which will be the case when we are training our instruct GPT model, then the current system fails since you...
### 🚀 The feature, motivation, and pitch https://arxiv.org/abs/2210.11693 Amos reports better scaling (for multi accelerator) and better performance when compared to AdamW for autoregressive and masked language modeling. We should...
### 🐛 Describe the bug There is no way to use multiple gpu if you're using Ray Tune, apparently we probably need to wrap ray.train.torch.TorchTrainer for it to work. It...
We should use RL4LMs benchmark suite, I think it is a strong candidate to show the strengths and weaknesses of TRLX.
A text-to-image RLHF pipeline and orchestrator is needed.
Do not merge directly, I changed some of the scripts (mostly removed the loading toy example components) Originally the code was not using JSONs specifications. Particularly it was saving multiple...
Hi I notice you cite "70B+ Full Tuning with 16 A100" however this is also something that trlX (and that we worked very hard to add ;) ) supports via...
[DistillCOMET](https://arxiv.org/abs/2110.07178) shows a lot of success in conditioning their common sense model by the knowledge provided by GPT3. I'll be experimenting with applying a similar approach to CARP, prompting [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)...
Putting this here so we can more easily compare.