What does this PR do?

Seq2SeqTrainer can load a GenerationConfig, by calling the from_pretrained method. This is done with the generation_config_from_pretrain argument from Seq2SeqTrainingArguments (or in kwargs of the Seq2SeqTrainer.evaluate and Seq2SeqTrainer.predict methods).

At first, we though of using a generation_config_file argument (#22203). I thought it would be even more versatile to consider it as a "from_pretrained" approach. Hence here generation_config_from_pretrain can also handle model ids and urls.

Small suggestion

As Seq2SeqTrainer actually brings very few additional functionality or modifications, would directly including these in Trainer be a good idea ? (one trainer to rule them all 💍)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline, Pull Request section?
[x] Was this discussed/approved via a Github issue or the forum? Yes: #22203
[x] Did you make sure to update the documentation with your changes?
[ ] Did you write any new necessary tests?

Who can review?

@sgugger @gante

Mar 22 '23 19:03 Natooz

The documentation is not available anymore as the PR was closed or merged.

Mar 22 '23 19:03 HuggingFaceDocBuilderDev

Examples are broken here is due to Seq2SeqTrainingArguments.generation_max_length and Seq2SeqTrainingArguments.generation_num_beams being removed.

From here what do you suggest between putting them back (and send a warning ?) or / and updating the examples ?

Mar 23 '23 08:03 Natooz

Hey guys, thanks for reviewing, its my pleasure to contribute considering how useful transformers have been to me ! 😃

@gante

Noted. I have put them back.
Sounds good, you probably know better the demand / usages. One note though, I initially called load_generation_config from the evaluate and predict methods for in case users specify a generation_config kwarg for these methods. Should we get rid of this (then forcing users to override trainer.generation_config if they need to change it) ? If not, maybe we do not need a __init__ whose purpose would solely be to create self._gen_config, which would also be done in evaluate and predict anyway ?

Mar 23 '23 18:03 Natooz

@Natooz I think so (forcing to override).

I'd rather have a simple solution now, and make it complex in the future if there is demand for it. Maintenance is a limitation on our side, our team is relatively small :)

Mar 24 '23 16:03 gante

That's totally understandable. You guys are already managing this ecosystem really well, and make a huge impact ! 🙌 I created the __init__ method, overriding model.generation_config. Indeed the code is shorter and simpler.

Mar 24 '23 18:03 Natooz

Hey @gante,

Thanks, the last changes are done. I'll take the instructions for the rebase, I just didn't do it right

Mar 26 '23 09:03 Natooz

The documentation is not available anymore as the PR was closed or merged.

Mar 26 '23 09:03 HuggingFaceDocBuilderDev

@Natooz I think everything went well, no further touches are needed in the rebase front :) Now we only need to add back the old arguments (self.args.generation_max_length and self.args.generation_num_beams) and their logic!

Mar 26 '23 13:03 gante

Good, evaluate and predict are back as original, it should be good now

Mar 26 '23 15:03 Natooz

Suggestions applied, sorry for these typos (copy / paste ...) 😅

Mar 27 '23 08:03 Natooz

@Natooz no worries. Thank you for all the contributions to this PR, they will help many LLM+trainer users! 🤗

Mar 27 '23 10:03 gante

Sure, here it is

Mar 27 '23 14:03 Natooz

Seq2seq trainer generation config arg

What does this PR do?

Small suggestion

Before submitting

Who can review?