idf for tf-idf and bm25

Open monkey0head opened this issue 3 years ago • 0 comments

Hi! I mentioned that you use this idf formula: idf(t) = log [ n / (df(t) + 1) ]) in knn models. Why did you choose this one?

I am curios because sklearn use this formula for smoothed idf(t) = log [ (1 + n) / (1 + df(t)) ] + 1, as with this formula idf takes only positive values. It is not intuitive that idf for term present in all documents is negative and idf for term presented in all except one document is 0 if we want use it to calculate dot products between documents (items).

Feb 15 '22 16:02 monkey0head