On the question of WER and CER
hi~ I have done some modification on your model architecture, But when i train, WER and CER are both 100...
Could you tell me how can i get the text of model output? I wanna find the reason!
By the way, thanks to your Work! i will cite your great paper!
Thank you for your interests on our work. I am wondering whether you could replicate my results without any model architecture modification. If you want to get the text of model output, I remember during the testing, the code will automatically generate a file to save these output. Please let me know if you have further questions. By the way, do you use. a different dataset for training?
Sure, when i use your model, everything is OK. So the calculation about wer and cer is correct.
In fact, i am going to use encodec to implement ALT rather than Wav2vec. For that, i use the FastConformer-CTC's encoder to replace the enc, which has 18 layers. I guess the model is too deep to train efficiently on a small dataset(N20em).
Therefore, the reason for the result of WER and CER is the feature is very very awful...
My newest experiment is as follow:
Yes, I understand. Here is my suggestion to address this issue. If you would like to train a larger model on N20EM, you can consider firstly pre-training on a large dataset, e.g. DSing dataset. After obtaining a fine WER, you can proceed to further fine-tune it on the small dataset, N20EM. Since our hyper parameters are adjusted using wav2vec 2.0 model, you may need to tune these parameters when replacing it with other model architectures, especially, learning rates. Hope these suggestions are helpful.