TokenBuffer for preprocessing Documents

Open Ayushk4 opened this issue 6 years ago • 0 comments

We have been using a fast TokenBuffer API to speed up for various tokenizers in WordTokenizers.jl.

Referring to #141 #140, I think it might be beneficial to extend the TokenBuffer API for Documents and Corpus that TextAnalysis.jl offers (excluding NGramDocument and TokenDocument). This can then be used to improve the performance for preprocessing.jl.

Edit: This could also serve as a solution for #74 #76

Apr 12 '19 05:04 Ayushk4