zshot
zshot copied to clipboard
Reduce T5 model size and enhance perfomances
Scenario summary
Current inference with t5 models is slow
Proposed solution
Investigate and implement solution to reduce model size and speed-up inference, some of the ideas to consider:
- Reduce T5 model size by 3X and increase the inference speed up to 5X: https://github.com/Ki6an/fastT5
- How to adapt a multilingual T5 model for a single language: https://towardsdatascience.com/how-to-adapt-a-multilingual-t5-model-for-a-single-language-b9f94f3d9c90
- https://huggingface.co/blog/optimum-inference