Nik comments

Results 54 comments of

Nik

Avoid error when a string in the truth is empty after transformation

I must say that this is a weird edge case, as in general you don't want the ground truth to be an empty string. Furthermore, the word-error-rate is undefined for...

Avoid error when a string in the truth is empty after transformation

Also note that `jiwer.RemoveEmptyStrings()` exists for this purpose.

Multispeaker WER

I think there are two ways of implementing this: 1) we need a either wrap around asclite which will require shipping its binary for every platform 2) or write a...

Hi, I made this example script, with references and predictions taken from https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn ```python3 from jiwer import wer, cer ground_truths = [ "宋朝末年年间定居粉岭围。", "渐渐行动不便", "二十一年去世。", "他们自称恰哈拉。", "局部干涩的例子包括有口干、眼睛干燥、及阴道干燥。", "嘉靖三十八年，登进士第三甲第二名。", "这一名称一直沿用至今。", "同时乔凡尼还得到包税合同和许多明矾矿的经营权。",...

Don't support Chinese?

Semantically, I assume that you need the "character" error rate instead of the word error rate, as I assume they should be equivalent for Chinese. Therefore, is the CER score...

RemovePunctuation does not remove smart/curly quotes

Thanks for spotting this! Should be fixed in the 2.5.0 release.

Can somebody explain me what the `_preprocess()` function is doing?

> Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. (https://en.wikipedia.org/wiki/Levenshtein_distance) Due to...

Still running into MemoryError? Huggingface

You should probably verify the version of dataset you're using is latest

Still running into MemoryError? Huggingface

I see that you're using `compute_measures` which is also used in the latest version. Would you be able to share what (size) your input to the metric computation is?

RemoveSpecificWords is not functioning as expected

Change the order of the compose: ``` text = "he asked a helpful question" stop_words = ['a', 'he'] print(jiwer.Compose([ jiwer.ToLowerCase(), jiwer.Strip(), jiwer.RemoveSpecificWords(stop_words), jiwer.RemoveMultipleSpaces(), jiwer.RemovePunctuation(), jiwer.RemoveEmptyStrings(), jiwer.SentencesToListOfWords(), ])(text)) ```