natural icon indicating copy to clipboard operation
natural copied to clipboard

Stopwords inconsistency

Open arnicas opened this issue 11 years ago • 2 comments

I'm a little flummoxed by your stopwords used in Tf-Idf - you've got "he" but not "she"? Immediately noticeable in my data set trial... Would love to make it optional to pass in a custom stoplist or not use one here.

arnicas avatar Feb 03 '15 19:02 arnicas

Maybe we should update the English list that Chris made in 2011. At the very least, it's not complicated to swap out in your own code.

https://github.com/NaturalNode/natural/blob/master/lib/natural/util/stopwords.js

snellingio avatar Feb 04 '15 00:02 snellingio

That is the stopword list used by tfidf, I think that list needs to be fixed up a bit. It wouldn't be hard I'm just tied up in a few other things so its not top of my list currently.

I also agree that it would be nice to make that part of tf idf optional and separate it a bit.

kkoch986 avatar Feb 04 '15 01:02 kkoch986