icefall Finetune Librispeech for a low resource language

Hi Guys, amazing works with the icefall recipes. I am quite new to using the recipes and having a hard time creating a custom dataset using lhotse that I have for my language (Bengali).

I have seen @marcoyang1998 adding some finetuning scripts for librispeech to Gigaspeech and Wenetspeech to Aishell. I am a bit confused how to do finetune for another language.

Please help me if possible. Thanks and great work!

Apr 06 '23 04:04 nabil6391

Hi, we didn't try cross-language finetuning before, but we can work this out together.

First of all, have you managed to create your own dataset using Lhotse?

Apr 06 '23 06:04 marcoyang1998

Not yet, I am trying to follow yesno recipe and librispeech. As far as I understand I have to create the manifest files first, then I can compute_fbank, prepare_lang and compute_hlg.

https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/prepare.sh https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh

Apr 06 '23 07:04 nabil6391

As far as I understand I have to create the manifest files first

Yes. This should be the first step. BTW, how many hours of transcribed speech do you have? Are they open-sourced?

Apr 06 '23 07:04 marcoyang1998

I was a very small dataset. For starters, I am considering using the common voice Bengali Dataset, which is larger than my one. I believe there are other bigger datasets as well but have to check.

Apr 06 '23 07:04 nabil6391

OK, once you finished preparing the dataset, you can start training from scratch (as a baseline to finetune). If you encounter any problems, feel free to ask here.

Apr 06 '23 13:04 marcoyang1998

Amazing work indeed, I have already trained one language (Transducer7_streaming + BPE) and would also like to finetune to a lower resource language, with different Alphabet, but same number of tokens.

What steps should I take ?

Apr 06 '23 14:04 joazoa

@joazoa I assume you are using a different output vocabulary? Then you can only initialize the encoder, the decoder and joiner need to be trained from a random initialization.

Apr 06 '23 15:04 marcoyang1998

@marcoyang1998 how do I initialize the encoder, the decoder and joiner from a random initialization.

Oct 02 '23 10:10 nabil6391

I think what @marcoyang1998 meant was that if you have a different output vocabulary, you would need to randomly initialize the decoder and joiner. But the encoder can be initialized from your final checkpoint. You can do something like model.encoder.load_state_dict(ckpt["model"], strict=False).

Oct 02 '23 13:10 desh2608

Thanks for replying @desh2608 , I am still quite new to k2-icefall.

Should I modify and put this part in the train.py file?

Oct 02 '23 13:10 nabil6391

Thanks for replying @desh2608 , I am still quite new to k2-icefall.

Should I modify and put this part in the train.py file?

Yes. The train.py script is plain PyTorch, so you can modify it to your requirements as you would any other PyTorch training code.

Oct 02 '23 14:10 desh2608