Is ItemItemRecommender inconsistent?
Hi, thanks for this nice package!
While working with the library I came across the following inconsistent behavior of ItemItemRecommender:
- ItemItemRecommender can generate less than N recommendations (while expected exactly N)
- ItemItemRecommender can add already liked items to recommendations, when flag filter_already_liked_items set to True
Such errors occur in all ItemItemRecommender variations: Cosine, TfIdf anb BM25. It is also interesting that the final distributions of recommendation lengths for each variation differ, as does the number of known interactions included in the recommendations.
Jupyter notebook with test data to reproduce problems - implicit_itemitem_bug_code.zip
Thanks for your bug report!
I'll check this soon
Firstly, 1 is not a bug. KNN in implicit is an approximate version. Let K be the number of neighbors stored and N be recommendation size.
Suppose that K <= N. For the user A who clicked only a single item B, only the items that top-K similar items to item B can have similarity score higher than 0. therefore, only top K items can be recommended.
Similarly, some items do not share users. In that case, the similarities among these items are zero . Some users contain only items that have fewer interactions(thus even do not share any interaction with other items)
Secondly,
def intersection_mapper(row):
row1 = row['item_id']
row2 = row['item_id_known']
return len(np.intersect1d(row1, row2))
def intersection_mapper(row):
row1 = set(row['item_id'])
row2 = set(row['item_id_known'])
return len(row1.intersection(row2))
Second version of intersect_mapper works correctly.(but I don't know why np.intersect1d does not work)
On the first point, I realized that these are limitations of the algorithm itself. It would still be nice to issue a warning in this situation, but this is secondary.
For the second point, I didn't understand your position, because your variation of intersection_mapper works identiсal to mine and there is still error with liked items in recommendations.
My version:
Your version:

The final tables are identical and show, that there are many already liked items in recommendations.