trl issues

lm_head and v_head, why re-initialize and why dropout?

2

First off, thank you for building this! 3 questions regarding the two heads of the policy model: 1. why re-initialize the weights in the language model head in ``` class...

clam004

build(deps): wandb needs upgrade + ge for transformers

I opened the PR a bit early, I will provide additional context very soon

MichaelKarpe

Reward either goes down or stays stagnant

1

Let me start off by saying thanks for writing such a wonderful, and easy to use library. I'm genuinely surprised that no one else has created one to approach this...

adhitya-synth

Notebook '04' showing no Training Effect

Hi Leandro, first of all many thanks for the amazing work on the library. I've found your documentation very easy to get into - especially paired with your talk at...

philn21

Key Error in Notebook '04-gpt2-sentiment-ppo-training.ipynb'

Hi Leandro, I was running the notebook '04-gpt2-sentiment-ppo-training.ipynb' for the first time, and received a Key Error when running the training loop section. It was in this line: ` rewards...

philn21

How to liberate the gpt2 from reference model?

8

Hi, We know that KL is used in the loss as a constraint for the difference between the original gpt2 and the active gpt2 which produces responses for rewards feedbacks....

yanan1116

How do I make prediction after saving the model to local machine?

I tried to save the model to local machine and make prediction from it. However, the text generated from that saved model is not as expected. Do you know of...

rv-ltran

trl with seq2seq

1

Hello, Thanks for releasing this code. I would like to use this algorithm with a trained seq2seq (x -> y) model. I would initialize the active model and ref model...

jbdel

Adds Python highlighting to the code block

JulesGM

Question about value indexing in batched_forward_pass() function

Thank you for your great work! I read issue #15 but I still don't understand why values should be shifted left in PPOTrainer.batched_forward_pass() https://github.com/lvwerra/trl/blob/master/trl/ppo.py#L203 . In #L201, `start` is already...

hongcheki

trl
trl copied to clipboard

Metadata

lm_head and v_head, why re-initialize and why dropout?

build(deps): wandb needs upgrade + ge for transformers

Reward either goes down or stays stagnant

Notebook '04' showing no Training Effect

Key Error in Notebook '04-gpt2-sentiment-ppo-training.ipynb'

How to liberate the gpt2 from reference model?

How do I make prediction after saving the model to local machine?

trl with seq2seq

Adds Python highlighting to the code block

Question about value indexing in batched_forward_pass() function

← Metadata

Owner

Metadata

trl trl copied to clipboard

Metadata

← Metadata

Owner

Metadata

trl
trl copied to clipboard