cat-state comments

Results 18 comments of


                                            cat-state

Add isort flake8

``` trlx/model/accelerate_base_model.py:6:1: F401 'torch.nn.functional as F' imported but unused trlx/model/accelerate_base_model.py:9:1: F401 'torch.utils.data.DataLoader' imported but unused trlx/model/accelerate_base_model.py:11:1: F401 'transformers.AutoConfig' imported but unused trlx/model/accelerate_base_model.py:13:1: F401 'trlx.data.BatchElement' imported but unused trlx/model/accelerate_base_model.py:13:1: F401 'trlx.data.RLElement'...

Add isort flake8

Should merge https://github.com/CarperAI/trlx/pull/24 first as most of the errors re in files touched by it

FasterTransformer reward model support

Addressed by Triton Inference Server client https://github.com/CarperAI/trlx/tree/add-hh-example

initial commit for trlx LORA support

cc @Sayanc93

initial commit for trlx LORA support

> I can get to it tomorrow or Monday. I'm wondering what the API should be to avoid modifying the model definitions? I think it would be like, instead of...

Example/Test Model Benchmarks (Canonical WandB runs)

@albertsun1 > Hey! I'm new to contributing to trlx, would it be worth for me to give this a go for the ppo/ilql sentiment examples? Sure, although you might need...

Example/Test Model Benchmarks (Canonical WandB runs)

So I see that WandB actually lists the commit hash used for a run. So if we could find/tag TRLX runs in wandb then each commit could be matched up...

RLHF with HH Anthropic data

Also ultimately depends on https://github.com/CarperAI/trlx/issues/14 for larger scale

Add flake8 and isort to pre-commit

https://github.com/CarperAI/trlx/pull/39

Self Play

This would also tie in to MCTS in the future, although that would likely require more thought on how to do it efficiently