beans Improving the Yelp Bean matching algorithm

We modified the match_utils so that the meeting weights between 2 users are calculated based on attributes instead of being uniformly set to 1

Nov 08 '23 16:11 conancain

Incorporating more user attributes in the matching mechanism will require having more columns in the postgres user table. To start, we can/will manually alter+update the table to add parameters (like language, location, manager id) to the postgres table.

In the future, we will have to figure out a programmatic way to fill the user table - either by editing the current cron job, if the source have the fields we need), or another cron job need to be created to pull data from the coreAPI and update each user record

Nov 08 '23 18:11 jeanne1994

This is an interesting decision to talk through. My previous assumptions were that we were getting this level of matching / user segmentation by creating the right subscriptions to split folks up by say office, location, interests etc. With this change that all becomes murky and makes me think we are trying to move to a place where we have one or very few subscriptions. Is that accurate?

Nov 08 '23 21:11 ny2ko

This is an interesting decision to talk through. My previous assumptions were that we were getting this level of matching / user segmentation by creating the right subscriptions to split folks up by say office, location, interests etc. With this change that all becomes murky and makes me think we are trying to move to a place where we have one or very few subscriptions. Is that accurate?

We are not trying to change the number of subscriptions. The idea here is that we want to avoid matching people who are in the same organization/have the same manager, as the idea of Beans is to connect with people across Yelp. It would be awkward to talk to your teammate through Beans match as you see/work with each other everyday.

Nov 09 '23 16:11 conancain

This is an interesting decision to talk through. My previous assumptions were that we were getting this level of matching / user segmentation by creating the right subscriptions to split folks up by say office, location, interests etc. With this change that all becomes murky and makes me think we are trying to move to a place where we have one or very few subscriptions. Is that accurate?

We are not trying to change the number of subscriptions. The idea here is that we want to avoid matching people who are in the same organization/have the same manager, as the idea of Beans is to connect with people across Yelp. It would be awkward to talk to your teammate through Beans match as you see/work with each other everyday.

Does it not work by applying rules? E.g. https://github.com/Yelp/beans/blob/master/api/yelp_beans/matching/pair_match.py#L23 can be used to avoid matching people in the same org

Nov 09 '23 16:11 ny2ko

Does it not work by applying rules? E.g. https://github.com/Yelp/beans/blob/master/api/yelp_beans/matching/pair_match.py#L23 can be used to avoid matching people in the same org

Yes, rules can avoid matching people with the exact same attribute. However, this change is aim to increase the "interesting-ness" of the pairs by maximizing the diversity within each pair.

IIUC, the current subscription mechanism is based on available meeting time and interest. I do see value in matching people that are more different within each subscription, this can spice up convo and enable more cross-background learning/discussion. This is how I imagine this feature does: I want to be matched with people that are working in domains that are different than mine, during my ML bean time.

Nov 09 '23 19:11 jeanne1994

Does it not work by applying rules? E.g. https://github.com/Yelp/beans/blob/master/api/yelp_beans/matching/pair_match.py#L23 can be used to avoid matching people in the same org

Yes, rules can avoid matching people with the exact same attribute. However, this change is aim to increase the "interesting-ness" of the pairs by maximizing the diversity within each pair.

IIUC, the current subscription mechanism is based on available meeting time and interest. I do see value in matching people that are more different within each subscription, this can spice up convo and enable more cross-background learning/discussion. This is how I imagine this feature does: I want to be matched with people that are working in domains that are different than mine, during my ML bean time.

Some additional context that could help here:

I set beans up at Twitch(since left) and folks have been using it to meet each other. There are very many different subscriptions that exist, from a company wide one to specific locations, to meetings within an org to 1 on 1 setups within a team using beans. Each of these has different expectations for criteria to enforce for matches. E.g. Location wise people don't want to be matched with someone on the same team but for the within team subscription, that is what folks actually want.

Is there a way to make these code changes work using the rules systems so we can preserve the flexibility this affords each meeting subscription?

Nov 09 '23 21:11 ny2ko

Is there a way to make these code changes work using the rules systems so we can preserve the flexibility this affords each meeting subscription?

Oh, these code changes functions alongside existing rules and subscription set. The algor respects the existing matching rules and each subscription's user pool. We are only re-shaping how pairs are created (currently completely random) under each subscription. As an example, when we generate pairs for UK tea time, the high level steps are:

get all the people who opt in for the week
create all possible pairs (itertools.combinations)
based on the rules of the subscription, remove pairs that can not be matched (eg. recently paired, same department etc)
create optimal pairs (what the code tries to do)
notify successful pairs

Its worth noting that the code is a marginal improvement on how users are matched, it is not trying to change the flow of the current match process

Nov 09 '23 21:11 jeanne1994