patch quantization
Update the code to be compatible with latest nemo. With batchsize=1, now the WER/CER is normal.
But still, the exported onnx model doesn't contain seq_length, which affects models with conv_mask:True and when batch size > 1 and the samples length is different in a batch, the WER/CER will be degraded if tested by generated trt engine.
At first glance this seems excessive details are leaking about quantization compared to before. Is all of this because the model is not exporting length information ?
Yes. In addition to the previous problem, the generated model should not quantize its decoder but the original speech_to_text_quant_infer.py will make the decoder be quantized by calling quant_modules.initialize() .
This pull request fixes 4 alerts when merging 0221011f254d07c368181eb7b1f7c3fb0e3adfef into 2be5853cdb64eb6c137babfd42a92272399c6c0a - view on LGTM.com
fixed alerts:
- 4 for Unused import
This pull request fixes 4 alerts when merging 4763c4d9d71e6b092ca3cf3eafd9a1078e4d1753 into 2be5853cdb64eb6c137babfd42a92272399c6c0a - view on LGTM.com
fixed alerts:
- 4 for Unused import
This pull request fixes 4 alerts when merging 763f4df4ccb7438bcae699e7027f5b37488be36e into 94126c4b659f1f0ba69d5892da9b4d851f6e2c42 - view on LGTM.com
fixed alerts:
- 4 for Unused import