ctc_decoder issues

语言模型解码错误

7

你好，请问有使用语言模型进行联合解码的脚本吗。我是从wenet那边过来的，我正尝试使用triton部署离线语音识别服务，在不含语言模型的情况下可以跑通并且获取正确结果，但是加上语言模型之后无法获取正确结果。服务端显示内容如下： E0524 08:34:07.612731 97 python.cc:1968] Stub process is unhealthy and it will be restarted. Loading the LM will be faster if you build a binary file. Reading /ws/onnx_model/lm.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100...

travisCxy

windows 上可以安装吗

1

你您好，请问这个在windows可以安装吗

uloveqian2021

不支持中英混合场景的LM?

2

现在如果arpa文件中，有中英混合的情况，输出的score_hyps都是诸如下面的情况： (((-3.4028234663852886e+38, ()), (-3.4028234663852886e+38, (1819, 29)), (-3.4028234663852886e+38, (29,)), (-3.4028234663852886e+38, (1819,)), (-3.4028234663852886e+38, (1819, 29, 2327)), (-3.4028234663852886e+38, (1819, 2327)), (-3.4028234663852886e+38, (29, 2327)), (-3.4028234663852886e+38, (2327,)), (-3.4028234663852886e+38, (1819, 5170)), (-3.4028234663852886e+38, (5170,))), ((-3.4028234663852886e+38, ()), (-3.4028234663852886e+38,...

ziyu123

bpe建模语言模型训练数据

大佬，我看见你提的这个[issues]( https://github.com/NVIDIA/NeMo/issues/215)，我想请教一下，英文语音识别使用bpe建模，语言模型训练数据是怎么个形式呢？如果是如下这种形式 THANK YOU _THANK _YOU 能使用这个ctc_beam_search_decoder_batch函数解码吗

Ryuk17

can get the token timestamp?

1

Yymax-max

Fix language model repeated scoring

In this pr，fix language model score repeatedly. When hotwords_scorer->is_character_based and ext_scorer->is_character_based() is false，The language model and hot word scores will be repeatedly calculated. In fact, if the language model is...

FieldsMedal

install error!

1

``` git clone https://github.com/Slyne/ctc_decoder.git && cd ctc_decoder/swig && bash setup.sh ``` ``` python3 -c "import swig_decoders" ``` ModuleNotFoundError: No module named '_swig_decoders'

DataXujing

分词语言模型make_ngram疑问

如果语言模型是分词的，is_character_based_==False, 分词语言模型只有遇到space_id才打分 make_ngram[4]得到截止当前帧固定长度窗口内的词语。如果输出token没有SPACE_ID_，只取当前帧固定长度窗口内的词语，前面的词不就没有取到的机会吗？ ``` // language model scoring float ngram_score = 0.0; if (ext_scorer != nullptr ) { if (hotwords_scorer != nullptr && !hotwords_scorer->hotwords_dict.empty() && !(hotwords_scorer->is_character_based ^ ext_scorer->is_character_based()) &&...

LRY1994

ctc_decoder
ctc_decoder copied to clipboard

Metadata

语言模型解码错误

windows 上可以安装吗

不支持中英混合场景的LM?

bpe建模语言模型训练数据

can get the token timestamp?

Fix language model repeated scoring

install error!

分词语言模型make_ngram疑问

← Metadata

Owner

Metadata

ctc_decoder ctc_decoder copied to clipboard

Metadata

← Metadata

Owner

Metadata

ctc_decoder
ctc_decoder copied to clipboard