Question/Suspection about evaluation
I have been testing ALS and BPR for implicit recommendation and both models provide really good looking results for item-to-item recommendation (checked the embeddings via TSNE and asking for item recommendations from the models). But when I run the mAP and precision at K evaluation metrics the rates linger around 2%-0.02% with mAP @ 1,5,20,30,50. It's hard for me to believe that the model would perform so poorly on user recommendations when the item recommendations seem to be spot on? So I'm just wondering if I should have more data (or it's not in the right format) or how should I try to pimp the model somehow to get better mAP @ scores. I'm using a custom dataset that I formatted the sameway as your movielens example.
I don't think your maP@K is that bad. You could try on a open dataset (such as movielens) to see what scores you get.