huangruizhe issues

Results 12 issues of


                                            huangruizhe

The efficiency of computing backoff

https://github.com/sfischer13/python-arpa/blob/2284b815866aeb08f65f786da416e78d7937ee1d/arpa/models/base.py#L34-L45 This try...catch mechanism to implement the backoff may not be efficient enough. According to the python [documentation](https://docs.python.org/3.6/faq/design.html#how-fast-are-exceptions): > A try/except block is extremely efficient if no exceptions are raised....

Fix an error in the RE of the parser

Fixed the issue #5.

important parser error

https://github.com/sfischer13/python-arpa/blob/2284b815866aeb08f65f786da416e78d7937ee1d/arpa/parsers/quick.py#L21 The regular expression has an error here. Consider the case where the line is: `-2.310726 maybe when 9.609759e-05` The exponent in the backoff weight is not correctly parsed --...

Taking forever to install kaldifeat

Hello, I am trying to install kaldifeat following the instructions in the doc, with this command: `pip install --verbose kaldifeat` It's been ~8 hours now but it seems the program...

show probability score in beam mode

When we list top N (N>1) variants of decoding in the --return_beams mode, can we have a probability score for each variant? It seems there is no such an option....

Check failed: min_num_buckets >= 0 (-2145552096 vs. 0)

Hello, I ran into a run-time error during decoding as follows. Basically, I've created the HLG graph as in Icefall, and did nbest decoding as in [here](https://github.com/espnet/espnet/blob/f16e579e2bfba906c5ce4c6ad15680999628c73d/espnet2/bin/asr_inference_k2.py#L334). I was wondering...

Librispeech ctc recipe

Added a Librispeech ctc recipe for demonstration purpose. - This recipe demonstrates using either [torch.nn.CTCLoss](https://pytorch.org/docs/stable/generated/torch.nn.CTCLoss.html) or [k2.ctc_loss](https://k2-fsa.github.io/k2/python_api/api.html#ctc-loss). Both can converge to similar results. - It supports using either CTC or...

CLA Signed

Using MMS model with `star` token for batch size > 1

The current implementation assumes batch size is one, when attaching the `star` dimension: https://github.com/pytorch/audio/blob/ea437b31ce316ea3d66fe73768c0dcb94edb79ad/src/torchaudio/pipelines/_wav2vec2/utils.py#L41 However, the underlying Wav2vec model supports batch size greater than one. So this line should instead...

Cannot download earnings22

Hello, when I do the [following](https://github.com/revdotcom/speech-datasets#steps-to-download-from-lfs): ``` cd earnings22 git lfs pull ``` There's such errors: ``` batch response: This repository is over its data quota. Account responsible for LFS...

Transcript issues for 4363614 in earnings-21

https://github.com/revdotcom/speech-datasets/blob/1852d8e8f79745415e17ed294f1de0f884513465/earnings21/transcripts/nlp_references/4363614.nlp#L2-L44 It seems the transcript there has some issue, as quoted. E.g. `` for company's name, `` for person's name. This can be checked against [here](https://seekingalpha.com/article/4363614-banco-santander-mexico-s-bsmx-ceo-hector-grisi-on-q2-2020-results-earnings-call-transcript)