trl icon indicating copy to clipboard operation
trl copied to clipboard

Train transformer language models with reinforcement learning.

Results 424 trl issues
Sort by recently updated
recently updated
newest added

First off, thank you for building this! 3 questions regarding the two heads of the policy model: 1. why re-initialize the weights in the language model head in ``` class...

I opened the PR a bit early, I will provide additional context very soon

Let me start off by saying thanks for writing such a wonderful, and easy to use library. I'm genuinely surprised that no one else has created one to approach this...

Hi Leandro, first of all many thanks for the amazing work on the library. I've found your documentation very easy to get into - especially paired with your talk at...

Hi Leandro, I was running the notebook '04-gpt2-sentiment-ppo-training.ipynb' for the first time, and received a Key Error when running the training loop section. It was in this line: ` rewards...

Hi, We know that KL is used in the loss as a constraint for the difference between the original gpt2 and the active gpt2 which produces responses for rewards feedbacks....

I tried to save the model to local machine and make prediction from it. However, the text generated from that saved model is not as expected. Do you know of...

Hello, Thanks for releasing this code. I would like to use this algorithm with a trained seq2seq (x -> y) model. I would initialize the active model and ref model...

Thank you for your great work! I read issue #15 but I still don't understand why values should be shifted left in PPOTrainer.batched_forward_pass() https://github.com/lvwerra/trl/blob/master/trl/ppo.py#L203 . In #L201, `start` is already...