johnchienbronci

Results 24 comments of


                                            johnchienbronci

Unsupported action found at bulk response.

File: src/Bulk.cc function: processResult() Find: // process items responses and replace it ``` // process items responses const char *itemKeyName = nullptr; for (const Json::Value &item: items) { if (!item.isObject())...

kenlm 訓練自定義語言模型後似乎在糾錯及校正錯字上沒有效果

對。是繁體中文, 以 char 來訓練語言模型 1. 想請教為何會跟訓練筆數多寡有關？看了一下arpa檔案, 內容應該是字的機率, 想說如果將辨識的關鍵字放入訓練應該會有效果才對🤔 2. 假設今天有一個字辨識錯誤 "讓坐" 我能夠從現有的文本(已有基礎辨識能力的文本)中將入只加入讓座, 這樣校正會有效果？還是要有許多"讓座"的相關句子 3. 訓練的通用語料除了筆數外, 有其他格式上的限制？如每筆長度限制 , 短句子或是不能有空白..... ? 4. 假設有個文本 "全球新能源經濟迅速崛起，電動車作為成長主力之一，替全球汽車製造產業帶來重大變革....." 是建議將此文本拆成3筆語料如下 ["全球新能源經濟迅速崛起", "電動車作為成長主力之一",...

kenlm 訓練自定義語言模型後似乎在糾錯及校正錯字上沒有效果

ok, 謝謝你的回覆

kenlm 訓練自定義語言模型後似乎在糾錯及校正錯字上沒有效果

我嘗試把訓練筆數增加到30萬筆, 看起來還是沒效果 order 設定 5 我是以char 來訓練, 如下圖以下是arpa 內容中的一部分資料 ``` ... -0.25336936 5 年利率 -0.10724029 -0.23248479 3 年利率 -0.10724029 -0.17525549 4 年利率 -0.10724029...

kenlm 訓練自定義語言模型後似乎在糾錯及校正錯字上沒有效果

有找到糾錯跟校正效果差的原因 1. default 設定檔要改成繁體中文 2. same_pinyin 增加相似音即可 3. token 只要存在 word_freq.txt 就會有機會誤判

test short audio file with silence between sentence, word timestamp may not be accurate

I found the "stable-ts" project and tested it. Its word timestamp is very accurate. Could you refer to its approach? Below are the testing results I obtained using "stable-ts": Enable...

test short audio file with silence between sentence, word timestamp may not be accurate

yes, there is no silence in the segment "好天氣".

[BUG]Error after changing the model from opt to gpt

@molly-smith Do you know what causes this? I encountered some errors when running the run_speech_recognition_ctc_streaming.sh by deepspeed ( torchrun --nproc_per_node 1 ... ) and his issue consistently occurs with my...

it takes too long for DynamicBucketingSampler to load state dict

@pzelasko > Unfortunately, yes. Restoring state of the sampler is unfortunately quite tricky to do quickly, and I don’t recommend using this technique with large data. Instead, it’s easier to...

it takes too long for DynamicBucketingSampler to load state dict

@pzelasko Ignore `train sampler.load_state_dict(sampler state_dict)`, right?

1
2
3
›