Desh Raj
Desh Raj
I haven't tried it on the Task 8 data, since I was only focusing on biomedical domain at the time. However, I'm pretty sure it wouldn't be as low as...
One thing I saw in your code is that you are using randomly initialized word embeddings. The number of parameters in CRNN is already somewhat larger than in CNN, and...
@GatsbyUSTC I agree with you. If we are using pretrained embeddings and not tuning them during training, an attention at the input layer would be meaningless. However, if embeddings are...
Related issue: https://github.com/kaldi-asr/kaldi/issues/4468
It's hard to say what may be going wrong just based on WER. Did you change any hyperparameters? Did you look at the intermediate results (e.g. from GMM decoding) to...
I think so, yeah. The WER without RNNLM rescoring should be closer to 46%. See this at the top of the run_cnn_tdnn_1b.sh: %WER 46.07 [ 27124 / 58881, 2905 ins,...
You can try tuning some of the hyperparameters (esp. learning rate) since you changed the number of training jobs (GPUs). But I think at this point you're close enough that...
Sorry, I don't think I have the eval numbers for that exact recipe on hand. We tried several systems during the challenge (see Table 7 in https://arxiv.org/pdf/2006.07898.pdf) and it seems...
The unperturbed cleaned data has about 300k utterances (900k for speed-perturbed version). So 60k is 1/5th of that, hence about 200 hours. But yeah, the comment should make that clearer.
Ok, I'll check.