HookeJs
HookeJs copied to clipboard
Appreciation and Curiosity
Just wanted to say that I think this is an amazing package you created. I'm really curious what sources you used to do the pre-processing? I've found various resources which support ever thing you're doing, but I've not found one succinct approach such as this aside from yours.
First of all, sorry for the very late reply, and thanks for the appreciation.
For the pre-processing, stop words are removed, words are stemmed using snowball stemmers, and finally are divided into n-grams. After that, matching n-grams in both texts are clustered together based on their Chebyshev distance, and each cluster is given a score, equivalent to the match length times its density.