Are there any tricks during LAS model training in order to avoid overfiiting?
Hi,
When I train LAS model using Lingvo, I found that the loss in the training set will not decrease, but the WER in the dev set keeps decrease. When I train LAS using Lingvo, even no dropout is used. But when I implement LAS using Pytorch, my model suffers overfitting badly (WER in the train set is 0.18%, 10.8% in dev set). I also used the same initialization method adopted by Lingvo. Both SpecAugment and Label smoothing are adopted for Lingvo and pytorch implementation. Do you think what's the most import thing for Lingvo to avoid overfitting?
Thanks
Can you detail more about the parameters you're using for your LAS model ? Maybe I'm having the same problem with a LAS model on Tensorflow (not lingvo). The training loss stop decreasing while the test loss keeps increasing and at the same time the CER on both sets keeps decreasing. When I inspect the attention image during training I find out that even with loss of the training that stops decreasing the attention keep getting more and more alligned. My las model was the original model described in the paper with 3 x 512 pBLSTM as the encoder and 2 x 512 LSTM on the decoder, with 13 MFCC with deltas and acceleration as inputs, but the attention was luong attention (it was mistake at the first but I noticed it after 4 days of traning. The attention was with pass_hidden_state and pass_bottom_only option for the conection between the encoder and the attention layer). I'm still in the process to do my experience wuth lingvo in the same time.