Feature Request: Weighted Averages in Ref2Vec
Describe your feature request
As of now, Ref2Vec-Centroid allows you to create a representation of an entity (say, an Article) as an aggregation of references to information that allow us to build up this entity/representation (its components in some manner). The only currently implemented aggregation method is to average over all reference vectors. However, some information/references may need to be more important than others in the construction of this representation, to which a weighted average lends itself quite nicely.
The weighted average would be very useful in the recommendation system scenarios that the Ref2Vec-Centroid module was first described for.
For example, if we are building a content-based movie recommendation system, we might want to construct a representation of a user's profile/tastes based on what they watch (by averaging over semantic representations of the movies they have watched). A weighted average would allow us to emphasize the features of movies they really liked by incorporating a rated-based weighting system for each movie they watch.
Similarly, for a project I worked on (a recommendation system that leveraged sales information from stores to recommend new products that they should stock), I wanted to characterize stores based on the products they sold. With some provision to account for the general saleability of each particular product (by comparison to sales of similar products across other stores), weighting the product vectors by their sales would help us identify what the store is most popular for and help to identify new items that would fit their brand.
Currently, we can only implement this functionality without Ref2Vec by retrieving the current vector of the entity, applying the custom weighting to the vector of the new “reference”, combining them and then writing the new vector.
It would be most convenient to be able to specify either a property of a new reference that should be used as its weight, or a function/module that can compute this weight from the entity.
There’s also another, more speculative class of usecases that I wanted to bring up here. So far, the weights have all been determined by already-known properties of the references themselves. Perhaps a weighting derived from semantic information in the references could be very useful as well. For example, maybe the user on an ecommerce site has just had a surprising reaction to a new product (maybe they’ve only ever bought groceries, but it turns out they quite like to splurge on shoes - quite unlike the profile they’d built on the site so far). We may want to emphasize this information in our representation of them because it is surprising (though this may be better suited to a more volatile type of recommendation, like posts or videos on a social media platform) - and assigning each new product a user purchases a weighting of which some component is inversely proportional to the similarity it has with their existing profile could help achieve this.
Code of Conduct
- [X] I have read and agree to the Weaviate's Contributor Guide and Code of Conduct