chop
chop copied to clipboard
MMSEGTokenizer throws exception during parsing sentence "那锣鼓也似乎从不知倦怠地动地喧天。"
Sample code:
from chop.mmseg import Tokenizer as MMSEGTokenizer
sentence = "那锣鼓也似乎从不知倦怠地动地喧天。"
def main():
MT = MMSEGTokenizer()
print(' / '.join(MT.cut(sentence)))
if __name__ == '__main__':
main()
expected: 那 / 锣鼓 / 也 / 似乎 / 从 / 不知 / 倦怠 / 地 / 动地喧天。
actual:
Traceback (most recent call last):
File "chop_fails.py", line 10, in <module>
main()
File "chop_fails.py", line 7, in main
print(' / '.join(MT.cut(sentence)))
File "/home/user/.local/lib/python3.5/site-packages/chop/mmseg/__init__.py", line 139, in cut
words, length = self.__ambiguity_resolution(chunks) # Ambiguity Resolution Rules
File "/home/user/.local/lib/python3.5/site-packages/chop/mmseg/__init__.py", line 198, in __ambiguity_resolution
Tokenizer.__print_chunks(chunks)
File "/home/user/.local/lib/python3.5/site-packages/chop/mmseg/__init__.py", line 208, in __print_chunks
DEBUG('%s: %s' % (tag, ' '.join([ w.text for w in y])))
TypeError: 'Word' object is not iterable