MM_ALT On the question of WER and CER

hi~ I have done some modification on your model architecture, But when i train, WER and CER are both 100... Could you tell me how can i get the text of model output? I wanna find the reason! By the way, thanks to your Work! i will cite your great paper! 1723280125821

Aug 10 '24 08:08 coding-sharks

Thank you for your interests on our work. I am wondering whether you could replicate my results without any model architecture modification. If you want to get the text of model output, I remember during the testing, the code will automatically generate a file to save these output. Please let me know if you have further questions. By the way, do you use. a different dataset for training?

Aug 11 '24 05:08 guxm2021

Sure, when i use your model, everything is OK. So the calculation about wer and cer is correct.

In fact, i am going to use encodec to implement ALT rather than Wav2vec. For that, i use the FastConformer-CTC's encoder to replace the enc, which has 18 layers. I guess the model is too deep to train efficiently on a small dataset(N20em).

Therefore, the reason for the result of WER and CER is the feature is very very awful...

My newest experiment is as follow: 1723361203147

Aug 11 '24 07:08 coding-sharks

Yes, I understand. Here is my suggestion to address this issue. If you would like to train a larger model on N20EM, you can consider firstly pre-training on a large dataset, e.g. DSing dataset. After obtaining a fine WER, you can proceed to further fine-tune it on the small dataset, N20EM. Since our hyper parameters are adjusted using wav2vec 2.0 model, you may need to tune these parameters when replacing it with other model architectures, especially, learning rates. Hope these suggestions are helpful.

Aug 11 '24 07:08 guxm2021