WeTextProcessing icon indicating copy to clipboard operation
WeTextProcessing copied to clipboard

符号"-"的badcase

Open WangGewu opened this issue 8 months ago • 1 comments

"-"在任何场景下都会被tn为"减"吗?我在以下两个场景中,"-"两边都不是数字,但还是被tn为"减" tn: 注意一下这个wide-awake的含义 -> 注意一下这个wide减awake的含义 整体-表现非常好 -> 整体减表现非常好

WangGewu avatar May 14 '25 03:05 WangGewu

我今天也遇到这个问题了。好像是tn/chinese/rules/math.py里的42行代码有问题,把所有情况都匹配了。我注释掉就可以了。

 19 from pynini import cross, string_file
 20 from pynini.lib.pynutil import delete, insert
 21
 22
 23 class Math(Processor):
 24
 25     def __init__(self):
 26         super().__init__(name="math")
 27         self.build_tagger()
 28         self.build_verbalizer()
 29
 30     def build_tagger(self):
 31         operator = string_file(get_abs_path("chinese/data/math/operator.tsv"))
 32         # When it appears alone, it is treated as punctuation
 33         symbols = (cross("~", "到")
 34                    | cross(":", "比")
 35                    | cross("<", "小于")
 36                    | cross(">", "大于"))
 37
 38         number = Cardinal().number
 39         tagger = (number +
 40                   (delete(" ").ques +
 41                    (operator | symbols) + delete(" ").ques + number).star)
 42         #tagger |= operator
 43         tagger = insert('value: "') + tagger + insert('"')
 44         self.tagger = self.add_tokens(tagger)

jason-ni avatar May 26 '25 15:05 jason-ni