mbc1990

Results 2 issues of mbc1990

WordTokenizer, WordPunctTokenizer, and TreebankWordTokenizer all have similar unusual behavior on accented (tilde-ed?) characters: ``` > var tokenizer = new natural.WordPunctTokenizer(); > tokenizer.tokenize('São Paulo'); [ 'S', 'ã', 'o', 'Paulo' ] >...

Help/Questions

Not sure if this is intentional or not: ``` var tokenizer = new natural.WordPunctTokenizer(); console.log(tokenizer.tokenize("Example sentence (with parenthetical expression).")); ``` outputs: ``` [ 'Example', 'sentence', ' (', 'with', 'parenthetical', 'expression',...

Help/Questions