prose
prose copied to clipboard
The IterTokenizer does not support unicode emoji
I noticed that when a text has emoji (😀etc) that the IterTokenizer does not completely parse the text. I'm prepping a pull request, if this seems worthwhile