WeTextProcessing
WeTextProcessing copied to clipboard
符号"-"的badcase
"-"在任何场景下都会被tn为"减"吗?我在以下两个场景中,"-"两边都不是数字,但还是被tn为"减" tn: 注意一下这个wide-awake的含义 -> 注意一下这个wide减awake的含义 整体-表现非常好 -> 整体减表现非常好
我今天也遇到这个问题了。好像是tn/chinese/rules/math.py里的42行代码有问题,把所有情况都匹配了。我注释掉就可以了。
19 from pynini import cross, string_file
20 from pynini.lib.pynutil import delete, insert
21
22
23 class Math(Processor):
24
25 def __init__(self):
26 super().__init__(name="math")
27 self.build_tagger()
28 self.build_verbalizer()
29
30 def build_tagger(self):
31 operator = string_file(get_abs_path("chinese/data/math/operator.tsv"))
32 # When it appears alone, it is treated as punctuation
33 symbols = (cross("~", "到")
34 | cross(":", "比")
35 | cross("<", "小于")
36 | cross(">", "大于"))
37
38 number = Cardinal().number
39 tagger = (number +
40 (delete(" ").ques +
41 (operator | symbols) + delete(" ").ques + number).star)
42 #tagger |= operator
43 tagger = insert('value: "') + tagger + insert('"')
44 self.tagger = self.add_tokens(tagger)