causalml Finding meaningful user cohorts in the Uber ad Bidder example

Hello. I followed your example notebook for Uber Targetting Optimization (slide is here and notebook is here ) for finding the best users to target to minimize the advertising cost. In the example, it seems you look at the gain chart and identify 60% of users that are responsible for most of the gain and then you target them. The thing that I did not understand was how do you identify those users? In the notebook and slides you listed some user features such as avg_waiting_time, avg_meal_subtotal etc. but they were never used later to identify users. For example, I was hoping, in the end you will conclude that this 60% of users are those whose avg_waiting_time is ... and avg_meal_subtotal is ... and so on. Can you please explain how can someone do that step?

Jul 02 '22 17:07 abdollahpouri

The users are identified based on the predicted uplift. The proportion of users to be targeted can be based on external constraints (for example, you have a budget for targeting 60% of users) or by examining the uplift curve. Generally, you would want to see where the uplift curve flattens to see what proportion of users it might make sense to target.

It sounds to me like you are interested in understanding descriptive differences between populations with different predicted uplifts. To do this, you could simply split the population into those who have a high predicted uplift and those who have a low predicted uplift (using some arbitrary criterion for "high" and "low"), and you could plot the relevant variables you are interested in, such as avg_waiting_time in the example. If you want to evaluate the differences formally, you can use any standard statistical procedure like the t-test.

However, note that the differences you might see between the groups are not necessarily meaningful. This is because the features that are correlated with differences in predicted uplift are not necessarily causes of those differences. For example, imagine that avg_waiting_time turns out to be very different between the 60% with the highest predicted uplift and the remaining 40% of the users. It might be because avg_waiting_time causes the treatment effect to be higher, or it might be because avg_waiting_time is just correlated with some other variable, such as whether you live in a city or not, which is the real cause of the differences in the treatment effect. Unless you are willing to make assumptions about the underlying causal structure, there is no way to move from correlation to causation.

Jul 02 '22 20:07 t-tte

Thanks very much. That was incredibly helpful and yes you guessed my goal correctly--I am looking for meaningful user cohorts. The reason is I think the approach that is used in the notebook I mentioned is not generalizable to new users and it can be applied only to that 60% of users who were already in the system. Please let me know if that is correct.

Jul 02 '22 20:07 abdollahpouri

The model learns the features of the population that are predictive of the treatment effect, and if you are willing to assume that new users who come in are similar to the ones on which the model has been trained, the relevant features should remain the same. If you are only interested in predicting the conditional treatment effect, you can simply pass a new observation to the models predict() method, and you don't need to worry about which features the model is using to predict the treatment effect.

Aug 30 '22 00:08 t-tte