NeMo patch quantization

Update the code to be compatible with latest nemo. With batchsize=1, now the WER/CER is normal.

But still, the exported onnx model doesn't contain seq_length, which affects models with conv_mask:True and when batch size > 1 and the samples length is different in a batch, the WER/CER will be degraded if tested by generated trt engine.

Aug 10 '21 07:08 Slyne

At first glance this seems excessive details are leaking about quantization compared to before. Is all of this because the model is not exporting length information ?

Yes. In addition to the previous problem, the generated model should not quantize its decoder but the original speech_to_text_quant_infer.py will make the decoder be quantized by calling quant_modules.initialize() .

Aug 10 '21 07:08 Slyne

This pull request fixes 4 alerts when merging 0221011f254d07c368181eb7b1f7c3fb0e3adfef into 2be5853cdb64eb6c137babfd42a92272399c6c0a - view on LGTM.com

fixed alerts:

4 for Unused import

Aug 10 '21 07:08 lgtm-com[bot]

This pull request fixes 4 alerts when merging 4763c4d9d71e6b092ca3cf3eafd9a1078e4d1753 into 2be5853cdb64eb6c137babfd42a92272399c6c0a - view on LGTM.com

fixed alerts:

4 for Unused import

Aug 10 '21 12:08 lgtm-com[bot]

This pull request fixes 4 alerts when merging 763f4df4ccb7438bcae699e7027f5b37488be36e into 94126c4b659f1f0ba69d5892da9b4d851f6e2c42 - view on LGTM.com

fixed alerts:

4 for Unused import

Aug 16 '21 13:08 lgtm-com[bot]