Yifan Yang

Results 19 issues of Yifan Yang

Accelerating RNN-T Training and Inference Using CTC guidance https://arxiv.org/pdf/2210.16481.pdf ### with scaling Lconv #### greedy_search | Model | test-clean | test-other | Decoding time(s) (3090) | Config | | ----------...

Add ConvRNN-T Encoder ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition https://arxiv.org/pdf/2209.14868.pdf model size: 44M The best WER on LibriSpeech 960h within 20 epoch is: epoch-20 avg-4...

If the following condition is True, it will lead to %CPU more than 100% (like 3500%). https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py#L1796-L1830

![截屏2023-06-29 12 21 45](https://github.com/WenyanLiu/CCFrank4dblp/assets/64255737/7376e4d7-7301-4b51-b669-c3dcf168f4ab)

This PR adds functionality to clear existing log handlers before setting up new ones. This feature addresses issues in environments like `torchrun`, where default loggers cannot be easily overridden. By...

Keeping a separate vector as an nn.Parameter in the model, and adding it times k/(1+k) to the embedding before the ReLU during training and decoding. It only supports greedy_search and...

Note that this PR does not include recipes for non-discrete features. The paper link: https://arxiv.org/pdf/2309.07377.pdf

issue: https://github.com/lhotse-speech/lhotse/issues/1096

Speech Commands: https://arxiv.org/pdf/1804.03209.pdf `epoch 28 avg 2` | metrics | result | | -- | -- | | True Positive | 2296 | | False Negative | 479 | |...