Yifan Yang
Yifan Yang
Accelerating RNN-T Training and Inference Using CTC guidance https://arxiv.org/pdf/2210.16481.pdf ### with scaling Lconv #### greedy_search | Model | test-clean | test-other | Decoding time(s) (3090) | Config | | ----------...
Add ConvRNN-T Encoder ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition https://arxiv.org/pdf/2209.14868.pdf model size: 44M The best WER on LibriSpeech 960h within 20 epoch is: epoch-20 avg-4...
If the following condition is True, it will lead to %CPU more than 100% (like 3500%). https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py#L1796-L1830

This PR adds functionality to clear existing log handlers before setting up new ones. This feature addresses issues in environments like `torchrun`, where default loggers cannot be easily overridden. By...
Keeping a separate vector as an nn.Parameter in the model, and adding it times k/(1+k) to the embedding before the ReLU during training and decoding. It only supports greedy_search and...
Note that this PR does not include recipes for non-discrete features. The paper link: https://arxiv.org/pdf/2309.07377.pdf
issue: https://github.com/lhotse-speech/lhotse/issues/1096
Speech Commands: https://arxiv.org/pdf/1804.03209.pdf `epoch 28 avg 2` | metrics | result | | -- | -- | | True Positive | 2296 | | False Negative | 479 | |...