Nik

Results 54 comments of Nik

I must say that this is a weird edge case, as in general you don't want the ground truth to be an empty string. Furthermore, the word-error-rate is undefined for...

Also note that `jiwer.RemoveEmptyStrings()` exists for this purpose.

I think there are two ways of implementing this: 1) we need a either wrap around asclite which will require shipping its binary for every platform 2) or write a...

Hi, I made this example script, with references and predictions taken from https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn ```python3 from jiwer import wer, cer ground_truths = [ "宋朝末年年间定居粉岭围。", "渐渐行动不便", "二十一年去世。", "他们自称恰哈拉。", "局部干涩的例子包括有口干、眼睛干燥、及阴道干燥。", "嘉靖三十八年,登进士第三甲第二名。", "这一名称一直沿用至今。", "同时乔凡尼还得到包税合同和许多明矾矿的经营权。",...

Semantically, I assume that you need the "character" error rate instead of the word error rate, as I assume they should be equivalent for Chinese. Therefore, is the CER score...

Thanks for spotting this! Should be fixed in the 2.5.0 release.

> Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. (https://en.wikipedia.org/wiki/Levenshtein_distance) Due to...

You should probably verify the version of dataset you're using is latest

I see that you're using `compute_measures` which is also used in the latest version. Would you be able to share what (size) your input to the metric computation is?

Change the order of the compose: ``` text = "he asked a helpful question" stop_words = ['a', 'he'] print(jiwer.Compose([ jiwer.ToLowerCase(), jiwer.Strip(), jiwer.RemoveSpecificWords(stop_words), jiwer.RemoveMultipleSpaces(), jiwer.RemovePunctuation(), jiwer.RemoveEmptyStrings(), jiwer.SentencesToListOfWords(), ])(text)) ```