Karim Foda
Karim Foda
Hey @Narsil. I've managed to get this working for greedy decoding and multimodal sampling. For beam-search, what would be the best approach to deal with a stop_sequence? I've assumed that...
Thanks @Narsil @gante. Okay so for the sake of deploying iteratively I've removed the `eos_token_id` from the `StoppingCriteria` and will add it as a separate PR. I've added a test...
> We should implement `stop_sequence` only once (probably in `generate`) but we could have 2 tests if you want to test the full pipeline too. (Probably in `tests/pipelines/test_pipelines_text_generation.py` for instance.)...
No problem I've just moved the stop_sequence back to the pipeline function and added the tests you requested in the `tests/pipelines/test_pipelines_text_generation.py` folder. This should make this PR ready for review...
Hey @sanchit-gandhi. Sorry this is taking so long, adding your changes was relatively easy but I'm a bit stuck trying to pass a few failing tests which I believe are...
of course that makes sense. Apologies for the misunderstanding. I'll work on the gradient checkpointing part using your suggestions and remove the `key_value_states` for this PR.
hey @sanchit-gandhi. I beleive this is now ready for review. The PR passes all the tests except ones related to inconsistencies between t5 and long_t5. If you're happy with this...
Thanks @sanchit-gandhi for all the helpful comments. Addressed them all and ran `make fix-copies`. Hopefully these changes should be reflected properly for `LongT5 `as well now.
Amazing, will keep you posted. Thanks for all the help getting this merged!
Hi @gante. Apologies I don't think I've properly clarified the use case I think this could solve. I unfortunately can't share my model here or my dataset (but happy to...