Stopwords inconsistency
I'm a little flummoxed by your stopwords used in Tf-Idf - you've got "he" but not "she"? Immediately noticeable in my data set trial... Would love to make it optional to pass in a custom stoplist or not use one here.
Maybe we should update the English list that Chris made in 2011. At the very least, it's not complicated to swap out in your own code.
https://github.com/NaturalNode/natural/blob/master/lib/natural/util/stopwords.js
That is the stopword list used by tfidf, I think that list needs to be fixed up a bit. It wouldn't be hard I'm just tied up in a few other things so its not top of my list currently.
I also agree that it would be nice to make that part of tf idf optional and separate it a bit.