Gain plots for multiple treatments
I am a bit confused about the way that the gain plot is created in the case of multiple treatments in the example depicted here.
Why aren't we calculating the gain in a same manner as in the case of having one treatment?
In the example below, the code is similar to the source code of plot_gain, where the best_uplift is the maximum uplift from among all treatments.
model_names = ['Model', 'Rand_1', 'Rand_2','Rand_3','Rand_4','Rand_5','Rand_6','Rand_7','Rand_8','Rand_9','Rand_10',]
df_preds = temp_test_df[['best_uplift', 'Orig_Recipe', 'BIND_CNT']] df_preds['is_treated'] = np.where(df_preds['Orig_Recipe'] == 'recipe_1', 0 , 1) df_preds = df_preds.sort_values('best_uplift', ascending=False).reset_index(drop=True) df_preds.index = df_preds.index + 1 df_preds['cumsum_tr'] = df_preds['is_treated'].cumsum() df_preds['cumsum_ct'] = df_preds.index.values - df_preds['cumsum_tr'] df_preds['cumsum_y_tr'] = (df_preds['BIND_CNT'] * df_preds['is_treated']).cumsum() df_preds['cumsum_y_ct'] = (df_preds['BIND_CNT'] * (1 - df_preds['is_treated'])).cumsum() df_preds['lift'] = df_preds['cumsum_y_tr'] / df_preds['cumsum_tr'] - df_preds['cumsum_y_ct'] / df_preds['cumsum_ct'] lift = [] lift.append(df_preds['cumsum_y_tr'] / df_preds['cumsum_tr'] - df_preds['cumsum_y_ct'] / df_preds['cumsum_ct'])
for i in range(10): df_preds = temp_test_df[['best_uplift', 'Orig_Recipe', 'BIND_CNT']] df_preds['best_uplift'] = np.random.rand(df_preds.shape[0]) df_preds['is_treated'] = np.where(df_preds['Orig_Recipe'] == 'recipe_1', 0 , 1) df_preds = df_preds.sort_values('best_uplift', ascending=False).reset_index(drop=True) df_preds.index = df_preds.index + 1 df_preds['cumsum_tr'] = df_preds['is_treated'].cumsum() df_preds['cumsum_ct'] = df_preds.index.values - df_preds['cumsum_tr'] df_preds['cumsum_y_tr'] = (df_preds['BIND_CNT'] * df_preds['is_treated']).cumsum() df_preds['cumsum_y_ct'] = (df_preds['BIND_CNT'] * (1 - df_preds['is_treated'])).cumsum() df_preds['lift'] = df_preds['cumsum_y_tr'] / df_preds['cumsum_tr'] - df_preds['cumsum_y_ct'] / df_preds['cumsum_ct'] lift.append(df_preds['cumsum_y_tr'] / df_preds['cumsum_tr'] - df_preds['cumsum_y_ct'] / df_preds['cumsum_ct'])
lift = pd.concat(lift, join='inner', axis=1) lift.loc[0] = np.zeros((lift.shape[1], )) lift = lift.sort_index().interpolate()
lift.columns = model_names lift['RANDOM'] = lift[model_names[1:]].mean(axis=1) lift.drop(model_names[1:], axis=1, inplace=True) gain = lift.mul(lift.index.values, axis=0)
gain = gain.div(np.abs(gain.iloc[-1, :]), axis=1) print('Model AUUC: ', gain['Model'].sum() / gain['Model'].shape[0]) print('Random AUUC: ', gain['RANDOM'].sum() / gain['RANDOM'].shape[0])
plt.figure(figsize = (7,6)) pp = plt.plot(gain) plt.xlabel('Population') plt.ylabel('Gain') plt.legend([pp[0], pp[1]], ['Model', 'Random']) plt.show()
Hi @soodimilanlouei the link you put there is for the uplift tree notebook, so I am a bit confused about what you are trying to ask here. Could you please elaborate a bit more? Why can't you use the function plot_gain here instead of doing it yourself?
Hi,
Apologies for not being clear enough.
What I'm trying to do here is to create an uplift curve where there are two treatments.
I am practically using the plot_gain function, but the difference comes from the way I create the input table to the function (auuc_metrics table in the uplift tree notebook and df_preds table in my code).
In the uplift tree notebook, you are filtering the table on observations where the treatment indicator is equal to control or it is equal to the recommended treatment by the model. Based on other examples that you provided in your repo, you are not doing such filtration where there is only one treatment. I'm trying to understand why we are not creating the input table to plot_gain function similar to the case where there is only one treatment.