codeprep
codeprep copied to clipboard
By default use end-of-full-token character (</t>) instead of token boundaries (<w>, </w>) for all kinds of pre-processing for consistency
Currently:
>>> api.basic("getName")
['<w>', 'get', 'Name', '</w>']
To be done:
>>> api.basic("getName")
['get', 'Name', '</t>']