Performance Enhancement Opportunity
Feature Description
Flashlight ASR decoder, fl_asr_decode can benefit from reduced precision computing and batch processing. In addition, the positional embedding function appears to be a bottleneck for both training and decoding.
Use Case
Most models should benefit from some or all of the aforementioned performance optimization opportunities. The Transformer model is an example that can benefit from all.
I have been running this for a week, seems to work fine for me. Transformer decoding speed ~x4, training speed on small 27m conformer about 5% faster.
In the mean time i found an issue, LER comparison will include UNK tokens that were added for padding which will lead to unexpected results where the WER is 0 but the LER is not.