Appreciation and Curiosity

Open KevinDanikowski opened this issue 5 years ago • 1 comments

Just wanted to say that I think this is an amazing package you created. I'm really curious what sources you used to do the pre-processing? I've found various resources which support ever thing you're doing, but I've not found one succinct approach such as this aside from yours.

Feb 26 '21 17:02 KevinDanikowski

First of all, sorry for the very late reply, and thanks for the appreciation.

For the pre-processing, stop words are removed, words are stemmed using snowball stemmers, and finally are divided into n-grams. After that, matching n-grams in both texts are clustered together based on their Chebyshev distance, and each cluster is given a score, equivalent to the match length times its density.

Dec 31 '21 02:12 hhhhhhhhhn