Jiayu DU

Results 21 comments of Jiayu DU

@BuyuanCui @ekmb This PR continues https://github.com/NVIDIA/NeMo/issues/4543 & https://github.com/NVIDIA/NeMo/pull/4638 , ready for another round of reviews.

@mzxcpp please give clear signals here once you have addressed all issues from last review, so that we know when to move forward, and everyone won't get distracted by intermediate...

Thanks for the report. Current code doesn't cover these cases, no plan to add rules to handle them in near future. Feel free to open a PR to fix it.

@sf9218 I guess you want to specify kenlm's vocabulary exactly as your vocabulary even though some of the words are not presented in your training text, this is a common...

@JRMeyer This is a typical speech recognition feature. If I understand you correctly, basically you want to up-weight or down-weight a list of "phrases", which may be a brand name,...

The pipeline was developed based on existing Kaldi scripts as you mentioned above, but with a lot of bug fixes and ad-hoc modifications. However we have no near plan to...

The dataset generation pipeline contains some steps that are not 100% reversible, so currently I'm afraid the answer is no.

As Kaldi recipe develpment is converging, it's time to think about how we organize this text normalization as a post processing before WER calculation. The processing is pretty simple, containing:...

I just added a simple scoring tool via https://github.com/SpeechColab/GigaSpeech/pull/35 , it uses sclite to evaluate REF and HYP. Before evaluation, the tool applies very simple text processing that we discussed...