trlx icon indicating copy to clipboard operation
trlx copied to clipboard

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Results 135 trlx issues
Sort by recently updated
recently updated
newest added

This PR adds the following `pre-commit` updates: * Update `pre-commit-hook` to a more recent version. * Adds `black` formatting to `tests` directory as it was never updated for the name...

### 🐛 Describe the bug Not able to train gpt2-large with ILQL with max_length=1024 on 4xA40 GPUS and ~900GB of RAM because of CUDA OOM error. ### Accelerate env ```...

bug

### 🚀 The feature, motivation, and pitch We need the ability to use massive reward models, as this will be necessary for our Instruct GPT model. Currently the size of...

I want my reward function to depend on the prompt used. Mainly, I want to fine-tune an LM for a conditional generation task e.g., summarization. It seems that the reward...

If the reward model cannot fit on a single GPU, which will be the case when we are training our instruct GPT model, then the current system fails since you...

### 📚 The doc issue Hi, Just curious about the range of tasks that trlx supports, I know trl only supports IMDB text continuation tasks. Still, I haven't figured out...

documentation

Basic support for low rank adaptation.

Work in progress integrating zero3 with hydra models for ppo. Current implementation works for models < 6B but OOMs on 6B.

TODO: make repo flake8 compatible