trlx issues

Update `pre-commit` version and add `isort`

This PR adds the following `pre-commit` updates: * Update `pre-commit-hook` to a more recent version. * Adds `black` formatting to `tests` directory as it was never updated for the name...

jon-tow

[WIP] Add T5 model

PhungVanDuy

CUDA OOM with large prompt length

1

### 🐛 Describe the bug Not able to train gpt2-large with ILQL with max_length=1024 on 4xA40 GPUS and ~900GB of RAM because of CUDA OOM error. ### Accelerate env ```...

AlekseyKorshuk

bug

FasterTransformer reward model support

8

### 🚀 The feature, motivation, and pitch We need the ability to use massive reward models, as this will be necessary for our Instruct GPT model. Currently the size of...

LouisCastricato

How to implement a conditional reward?

4

I want my reward function to depend on the prompt used. Mainly, I want to fine-tune an LM for a conditional generation task e.g., summarization. It seems that the reward...

mukhal

Large reward model issue.

4

If the reward model cannot fit on a single GPU, which will be the case when we are training our instruct GPT model, then the current system fails since you...

LouisCastricato

Trying to figure out NLP tasks that trlx can be applied to

3

### 📚 The doc issue Hi, Just curious about the range of tasks that trlx supports, I know trl only supports IMDB text continuation tasks. Still, I haven't figured out...

promiseve

documentation

initial commit for trlx LORA support

10

Basic support for low rank adaptation.

ethankim00

Ppo z3

1

Work in progress integrating zero3 with hydra models for ppo. Current implementation works for models < 6B but OOMs on 6B.

Dahoas

Add isort flake8

4

TODO: make repo flake8 compatible

cat-state

trlx
trlx copied to clipboard

Metadata

Update `pre-commit` version and add `isort`

[WIP] Add T5 model

CUDA OOM with large prompt length

FasterTransformer reward model support

How to implement a conditional reward?

Large reward model issue.

Trying to figure out NLP tasks that trlx can be applied to

initial commit for trlx LORA support

Ppo z3

Add isort flake8

← Metadata

Owner

Metadata

trlx trlx copied to clipboard

Metadata

← Metadata

Owner

Metadata

trlx
trlx copied to clipboard