transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Seq2seq trainer generation config arg

Open Natooz opened this issue 2 years ago • 9 comments

What does this PR do?

Seq2SeqTrainer can load a GenerationConfig, by calling the from_pretrained method. This is done with the generation_config_from_pretrain argument from Seq2SeqTrainingArguments (or in kwargs of the Seq2SeqTrainer.evaluate and Seq2SeqTrainer.predict methods).

At first, we though of using a generation_config_file argument (#22203). I thought it would be even more versatile to consider it as a "from_pretrained" approach. Hence here generation_config_from_pretrain can also handle model ids and urls.

Small suggestion

As Seq2SeqTrainer actually brings very few additional functionality or modifications, would directly including these in Trainer be a good idea ? (one trainer to rule them all 💍)

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [x] Was this discussed/approved via a Github issue or the forum? Yes: #22203
  • [x] Did you make sure to update the documentation with your changes?
  • [ ] Did you write any new necessary tests?

Who can review?

@sgugger @gante

Natooz avatar Mar 22 '23 19:03 Natooz

The documentation is not available anymore as the PR was closed or merged.

Examples are broken here is due to Seq2SeqTrainingArguments.generation_max_length and Seq2SeqTrainingArguments.generation_num_beams being removed.

From here what do you suggest between putting them back (and send a warning ?) or / and updating the examples ?

Natooz avatar Mar 23 '23 08:03 Natooz

Hey guys, thanks for reviewing, its my pleasure to contribute considering how useful transformers have been to me ! 😃

@gante

  1. Noted. I have put them back.
  2. Sounds good, you probably know better the demand / usages. One note though, I initially called load_generation_config from the evaluate and predict methods for in case users specify a generation_config kwarg for these methods. Should we get rid of this (then forcing users to override trainer.generation_config if they need to change it) ? If not, maybe we do not need a __init__ whose purpose would solely be to create self._gen_config, which would also be done in evaluate and predict anyway ?

Natooz avatar Mar 23 '23 18:03 Natooz

@Natooz I think so (forcing to override).

I'd rather have a simple solution now, and make it complex in the future if there is demand for it. Maintenance is a limitation on our side, our team is relatively small :)

gante avatar Mar 24 '23 16:03 gante

That's totally understandable. You guys are already managing this ecosystem really well, and make a huge impact ! 🙌 I created the __init__ method, overriding model.generation_config. Indeed the code is shorter and simpler.

Natooz avatar Mar 24 '23 18:03 Natooz

Hey @gante,

Thanks, the last changes are done. I'll take the instructions for the rebase, I just didn't do it right

Natooz avatar Mar 26 '23 09:03 Natooz

The documentation is not available anymore as the PR was closed or merged.

@Natooz I think everything went well, no further touches are needed in the rebase front :) Now we only need to add back the old arguments (self.args.generation_max_length and self.args.generation_num_beams) and their logic!

gante avatar Mar 26 '23 13:03 gante

Good, evaluate and predict are back as original, it should be good now

Natooz avatar Mar 26 '23 15:03 Natooz

Suggestions applied, sorry for these typos (copy / paste ...) 😅

Natooz avatar Mar 27 '23 08:03 Natooz

@Natooz no worries. Thank you for all the contributions to this PR, they will help many LLM+trainer users! 🤗

gante avatar Mar 27 '23 10:03 gante

Sure, here it is

Natooz avatar Mar 27 '23 14:03 Natooz