icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Finetune Librispeech for a low resource language

Open nabil6391 opened this issue 3 years ago • 11 comments

Hi Guys, amazing works with the icefall recipes. I am quite new to using the recipes and having a hard time creating a custom dataset using lhotse that I have for my language (Bengali).

I have seen @marcoyang1998 adding some finetuning scripts for librispeech to Gigaspeech and Wenetspeech to Aishell. I am a bit confused how to do finetune for another language.

Please help me if possible. Thanks and great work!

nabil6391 avatar Apr 06 '23 04:04 nabil6391

Hi, we didn't try cross-language finetuning before, but we can work this out together.

First of all, have you managed to create your own dataset using Lhotse?

marcoyang1998 avatar Apr 06 '23 06:04 marcoyang1998

Not yet, I am trying to follow yesno recipe and librispeech. As far as I understand I have to create the manifest files first, then I can compute_fbank, prepare_lang and compute_hlg.

https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/prepare.sh https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh

nabil6391 avatar Apr 06 '23 07:04 nabil6391

As far as I understand I have to create the manifest files first

Yes. This should be the first step. BTW, how many hours of transcribed speech do you have? Are they open-sourced?

marcoyang1998 avatar Apr 06 '23 07:04 marcoyang1998

I was a very small dataset. For starters, I am considering using the common voice Bengali Dataset, which is larger than my one. I believe there are other bigger datasets as well but have to check.

nabil6391 avatar Apr 06 '23 07:04 nabil6391

OK, once you finished preparing the dataset, you can start training from scratch (as a baseline to finetune). If you encounter any problems, feel free to ask here.

marcoyang1998 avatar Apr 06 '23 13:04 marcoyang1998

Amazing work indeed, I have already trained one language (Transducer7_streaming + BPE) and would also like to finetune to a lower resource language, with different Alphabet, but same number of tokens.

What steps should I take ?

joazoa avatar Apr 06 '23 14:04 joazoa

@joazoa I assume you are using a different output vocabulary? Then you can only initialize the encoder, the decoder and joiner need to be trained from a random initialization.

marcoyang1998 avatar Apr 06 '23 15:04 marcoyang1998

@marcoyang1998 how do I initialize the encoder, the decoder and joiner from a random initialization.

nabil6391 avatar Oct 02 '23 10:10 nabil6391

I think what @marcoyang1998 meant was that if you have a different output vocabulary, you would need to randomly initialize the decoder and joiner. But the encoder can be initialized from your final checkpoint. You can do something like model.encoder.load_state_dict(ckpt["model"], strict=False).

desh2608 avatar Oct 02 '23 13:10 desh2608

Thanks for replying @desh2608 , I am still quite new to k2-icefall.

Should I modify and put this part in the train.py file?

nabil6391 avatar Oct 02 '23 13:10 nabil6391

Thanks for replying @desh2608 , I am still quite new to k2-icefall.

Should I modify and put this part in the train.py file?

Yes. The train.py script is plain PyTorch, so you can modify it to your requirements as you would any other PyTorch training code.

desh2608 avatar Oct 02 '23 14:10 desh2608