Lai Zhihao
Lai Zhihao
I had a similar problem, have you solved it yet?
Each time the test set in aishell1 is decoded it outputs some uniform character, for example:'zan si a da sa san', 'zan si a da sa san', 'zan si a...
@oshindow I successfully recognized aishell's test audio by trying to use a linear layer instead of a downsampling layer in the wenet framework.
@danpovey It is true that there is no convergence, neither the ctc nor the attention loss decreases as the number of epochs increases
2023-07-24 23:41:34,159 INFO [train.py:554] (1/2) Epoch 0, batch 8400, loss[ctc_loss=0.302, att_loss=0.2322, loss=0.2531, over 9505.00 frames. utt_duration=559.1 frames, utt_pad_proportion=0.0139, over 17.00 utterances.], tot_loss[ctc_loss=0.2931, att_loss=0.2315, loss=0.25, over 1900551.80 frames. utt_duration=460.4 frames, utt_pad_proportion=0.02389,...
> @GabrielHaoHao If you are using the conformer_cte recipe you shouldn't be, as there are much better recipes now. Are you sure you didn't make any changes to the scripts?...
Looking forward to your reply. Thank you very much!