Lai Zhihao comments

Results 7 comments of


                                            Lai Zhihao

Removing the downsampling layer leads to a high WER

I had a similar problem, have you solved it yet?

Removing the downsampling layer leads to a high WER

Each time the test set in aishell1 is decoded it outputs some uniform character, for example:'zan si a da sa san', 'zan si a da sa san', 'zan si a...

Removing the downsampling layer leads to a high WER

@oshindow I successfully recognized aishell's test audio by trying to use a linear layer instead of a downsampling layer in the wenet framework.

Removing the downsampling layer leads to a high WER

@danpovey It is true that there is no convergence, neither the ctc nor the attention loss decreases as the number of epochs increases

Removing the downsampling layer leads to a high WER

2023-07-24 23:41:34,159 INFO [train.py:554] (1/2) Epoch 0, batch 8400, loss[ctc_loss=0.302, att_loss=0.2322, loss=0.2531, over 9505.00 frames. utt_duration=559.1 frames, utt_pad_proportion=0.0139, over 17.00 utterances.], tot_loss[ctc_loss=0.2931, att_loss=0.2315, loss=0.25, over 1900551.80 frames. utt_duration=460.4 frames, utt_pad_proportion=0.02389,...

Removing the downsampling layer leads to a high WER

> @GabrielHaoHao If you are using the conformer_cte recipe you shouldn't be, as there are much better recipes now. Are you sure you didn't make any changes to the scripts?...

about training set

Looking forward to your reply. Thank you very much!