Louis Castricato issues

Results 15 issues of


                                            Louis Castricato

FasterTransformer reward model support

### 🚀 The feature, motivation, and pitch We need the ability to use massive reward models, as this will be necessary for our Instruct GPT model. Currently the size of...

Large reward model issue.

If the reward model cannot fit on a single GPU, which will be the case when we are training our instruct GPT model, then the current system fails since you...

### 🚀 The feature, motivation, and pitch https://arxiv.org/abs/2210.11693 Amos reports better scaling (for multi accelerator) and better performance when compared to AdamW for autoregressive and masked language modeling. We should...

Ray Tune sweep does not support multi GPU

### 🐛 Describe the bug There is no way to use multiple gpu if you're using Ray Tune, apparently we probably need to wrap ray.train.torch.TorchTrainer for it to work. It...

bug

Benchmark suite

We should use RL4LMs benchmark suite, I think it is a strong candidate to show the strengths and weaknesses of TRLX.

Stable diffusion support

A text-to-image RLHF pipeline and orchestrator is needed.

Fixed loading errors

Do not merge directly, I changed some of the scripts (mostly removed the loading toy example components) Originally the code was not using JSONs specifications. Particularly it was saving multiple...

Citation or comparison to trlX and NeMo-align.

Hi I notice you cite "70B+ Full Tuning with 16 A100" however this is also something that trlX (and that we worked very hard to add ;) ) supports via...

Experimentation with DistillCARP

[DistillCOMET](https://arxiv.org/abs/2110.07178) shows a lot of success in conditioning their common sense model by the knowledge provided by GPT3. I'll be experimenting with applying a similar approach to CARP, prompting [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)...

Adds docs to magicarp

Putting this here so we can more easily compare.