[ASK] add new user
Description
Good afternoon!
Help, please, to understand. If I trained the model on the data. And I need to predict, based on past data, a recommendation for a new user that wasn't in the data. But I know the tastes of the new user. Can't I do something like in the lightFM library.
model.predict(0, items, user_features=user_features_matrix)
Here we are, together with the user and products, we submit the user's features/products (roughly speaking, his tastes) and then we get a recommendation.
From the documentation, I realized that we can only do model.rank() and there are only indexes of already existing users...
Tell me how to get around this? Thank you!
Other Comments
Thanks for your hard work, cool repository and thanks for the lessons!)
Hi! I am facing a similar issue while trying to employ a User Split strategy for train/val/test split.
This means that instead of having, say of 60% user-item interactions in the training set, 20% of interactions in the validation and 20% of interactions in the test set, I want 60% of users in the training set, 20% of users in the validation and 20% of users in the test set.
Consequently, when making predictions for users in the validation or test set, those users are unseen during training.
I though of splitting into training, validation and test set by keeping the whole set of user ids, but by masking interactions of an user if the user belongs to another set.
For instance, if I have 10 users and 20 items, all matrices - train, val, test - will have shape $(10\times 20)$, but:
- train will only have non-zero entries in the upper $(6\times 20)$ block
- val will only have non-zero entries in the middle $(2\times 20)$ block
- test will only have non-zero entries in the bottom $(2\times 20)$ block.
Since you have been very helpful in the past, @saghiles , I ask for your help :) Do you think approach this could work?
EDIT
I reply myself: no, it could not. The reason being the following: assume an user is in the test set, i.e. the corresponding row in the training set is empty. It is true that the model does not train on it, but during the evaluation on the test set, it is seen as an empty user, and everything in the test set is seen as "target predictions". What I would like, instead, is that some of the interactions of that users are used as historical data to perform a "forward pass" during evaluation (but not during training!), and the remaining are used as targets... and I believe there is no way of doing that in cornac :cry: .
Please correct me if I am wrong, @saghiles , I very much hope I am, actually! :stuck_out_tongue:
Cheers, Marta
Description
Good afternoon! Help, please, to understand. If I trained the model on the data. And I need to predict, based on past data, a recommendation for a new user that wasn't in the data. But I know the tastes of the new user. Can't I do something like in the lightFM library.
model.predict(0, items, user_features=user_features_matrix)Here we are, together with the user and products, we submit the user's features/products (roughly speaking, his tastes) and then we get a recommendation. From the documentation, I realized that we can only do model.rank() and there are only indexes of already existing users... Tell me how to get around this? Thank you!Other Comments
Thanks for your hard work, cool repository and thanks for the lessons!)
Hi, the current model implementations in Cornac do not support the use case you are describing, i.e., handling new users or items at inference time. This is because, most models (e.g., matrix factorization) assume that the set of users and items are static, and in order to handle new comers, at least one training iteration would be required to infer embeddings/factors for new comers.
Some models however, such as VAECF, can easily handle new users at test time (without further training). The only requirement is to have access to at least one interaction between every new user and the items seen during training. To achieve this with Cornac, you will need to change slightly the score() function in recom_vaecf.py.
Hi! I am facing a similar issue while trying to employ a User Split strategy for train/val/test split.
This means that instead of having, say of 60% user-item interactions in the training set, 20% of interactions in the validation and 20% of interactions in the test set, I want 60% of users in the training set, 20% of users in the validation and 20% of users in the test set.
Consequently, when making predictions for users in the validation or test set, those users are unseen during training.
I though of splitting into training, validation and test set by keeping the whole set of user ids, but by masking interactions of an user if the user belongs to another set.
For instance, if I have 10 users and 20 items, all matrices - train, val, test - will have shape (10×20), but:
- train will only have non-zero entries in the upper (6×20) block
- val will only have non-zero entries in the middle (2×20) block
- test will only have non-zero entries in the bottom (2×20) block.
Since you have been very helpful in the past, @saghiles , I ask for your help :) Do you think approach this could work?
EDIT
I reply myself: no, it could not. The reason being the following: assume an user is in the test set, i.e. the corresponding row in the training set is empty. It is true that the model does not train on it, but during the evaluation on the test set, it is seen as an empty user, and everything in the test set is seen as "target predictions". What I would like, instead, is that some of the interactions of that users are used as historical data to perform a "forward pass" during evaluation (but not during training!), and the remaining are used as targets... and I believe there is no way of doing that in cornac 😢 .
Please correct me if I am wrong, @saghiles , I very much hope I am, actually! 😛
Cheers, Marta
Hi @mmosc,
Your understanding is correct. And thank you for helping us to handle this issue.
To get something close to the user-based splitting you are describing with Cornac, you can rely on the stratified_split strategy. This approach performs train/val/test splitting for every user, either randomly or based on time. With this approach, still every user in the test set appears in the training set as well.
Currently there is no splitting strategy under Cornac in which test users are completely new.