trlx
trlx copied to clipboard
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
This PR adds the following `pre-commit` updates: * Update `pre-commit-hook` to a more recent version. * Adds `black` formatting to `tests` directory as it was never updated for the name...
### 🐛 Describe the bug Not able to train gpt2-large with ILQL with max_length=1024 on 4xA40 GPUS and ~900GB of RAM because of CUDA OOM error. ### Accelerate env ```...
### 🚀 The feature, motivation, and pitch We need the ability to use massive reward models, as this will be necessary for our Instruct GPT model. Currently the size of...
I want my reward function to depend on the prompt used. Mainly, I want to fine-tune an LM for a conditional generation task e.g., summarization. It seems that the reward...
If the reward model cannot fit on a single GPU, which will be the case when we are training our instruct GPT model, then the current system fails since you...
### 📚 The doc issue Hi, Just curious about the range of tasks that trlx supports, I know trl only supports IMDB text continuation tasks. Still, I haven't figured out...
Basic support for low rank adaptation.
Ppo z3
Work in progress integrating zero3 with hydra models for ppo. Current implementation works for models < 6B but OOMs on 6B.
TODO: make repo flake8 compatible