Request for Implementing Support for wav2vec2, MMS, and XLS-R Models
Hi there,
I really enjoy using this library and appreciate all your hard work. I noticed that you've added Whisper, which is great. But I've been working with other models like wav2vec2, MMS, and XLS-R and they perform really well.
What's great about them is that they need very little finetuning to get them to work really well, especially when we talk about WER. I've found Whisper needs a bit more finetuning in this regard, especially for languages that don't have a lot of resources.
I was wondering, do you plan to add support for these models, wav2vec2, MMS, and XLS-R, to the library? If not, could you guide me on how I might add them?
Here are some helpful links to know more about these models:
I think adding these models would make the library even better, especially for those who work with less commonly used languages.
Thanks a lot!
Yeah, I would also like the MMS model (both ASR and TTS) to be supported by ctranslate2.