Seq2seq trainer generation config arg
What does this PR do?
Seq2SeqTrainer can load a GenerationConfig, by calling the from_pretrained method. This is done with the generation_config_from_pretrain argument from Seq2SeqTrainingArguments (or in kwargs of the Seq2SeqTrainer.evaluate and Seq2SeqTrainer.predict methods).
At first, we though of using a generation_config_file argument (#22203). I thought it would be even more versatile to consider it as a "from_pretrained" approach. Hence here generation_config_from_pretrain can also handle model ids and urls.
Small suggestion
As Seq2SeqTrainer actually brings very few additional functionality or modifications, would directly including these in Trainer be a good idea ? (one trainer to rule them all 💍)
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [x] Was this discussed/approved via a Github issue or the forum? Yes: #22203
- [x] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?
Who can review?
@sgugger @gante
The documentation is not available anymore as the PR was closed or merged.
Examples are broken here is due to Seq2SeqTrainingArguments.generation_max_length and Seq2SeqTrainingArguments.generation_num_beams being removed.
From here what do you suggest between putting them back (and send a warning ?) or / and updating the examples ?
Hey guys, thanks for reviewing, its my pleasure to contribute considering how useful transformers have been to me ! 😃
@gante
- Noted. I have put them back.
- Sounds good, you probably know better the demand / usages. One note though, I initially called
load_generation_configfrom theevaluateandpredictmethods for in case users specify ageneration_configkwargfor these methods. Should we get rid of this (then forcing users to overridetrainer.generation_configif they need to change it) ? If not, maybe we do not need a__init__whose purpose would solely be to createself._gen_config, which would also be done inevaluateandpredictanyway ?
@Natooz I think so (forcing to override).
I'd rather have a simple solution now, and make it complex in the future if there is demand for it. Maintenance is a limitation on our side, our team is relatively small :)
That's totally understandable. You guys are already managing this ecosystem really well, and make a huge impact ! 🙌
I created the __init__ method, overriding model.generation_config. Indeed the code is shorter and simpler.
Hey @gante,
Thanks, the last changes are done. I'll take the instructions for the rebase, I just didn't do it right
The documentation is not available anymore as the PR was closed or merged.
@Natooz I think everything went well, no further touches are needed in the rebase front :) Now we only need to add back the old arguments (self.args.generation_max_length and self.args.generation_num_beams) and their logic!
Good, evaluate and predict are back as original, it should be good now
Suggestions applied, sorry for these typos (copy / paste ...) 😅
@Natooz no worries. Thank you for all the contributions to this PR, they will help many LLM+trainer users! 🤗
Sure, here it is