implicit
implicit copied to clipboard
idf for tf-idf and bm25
Hi! I mentioned that you use this idf formula: idf(t) = log [ n / (df(t) + 1) ]) in knn models. Why did you choose this one?
I am curios because sklearn use this formula for smoothed idf(t) = log [ (1 + n) / (1 + df(t)) ] + 1, as with this formula idf takes only positive values. It is not intuitive that idf for term present in all documents is negative and idf for term presented in all except one document is 0 if we want use it to calculate dot products between documents (items).