icetk icon indicating copy to clipboard operation
icetk copied to clipboard

what‘s the meaning of token 20005?

Open xu-song opened this issue 3 years ago • 0 comments

tokens = icetk.encode('你好世界!这里是 icetk。')
for token in tokens:
    print(token, icetk.text_tokenizer.proto.pieces[token - 20000].piece)
20005 ▁
94874 你好
84097 世界
20035 !
94947 这里是
22881 ▁ice
35955 tk
83823 。

what is "▁" used for?

xu-song avatar Mar 22 '23 13:03 xu-song