DiCE icon indicating copy to clipboard operation
DiCE copied to clipboard

Unable to generate counterfactuals for certain instances

Open grtwrrn opened this issue 4 years ago • 8 comments

Hi,

I've trained DICE with a k-nn classifier, and want to generate counterfactuals for a test set from the same dataset. It works fine for most instances in the test set, but for others, it remains stuck indefinitely (see screenshot below). I'm wondering why this is, and how it can be fixed. Thanks!

image

grtwrrn avatar Sep 13 '21 12:09 grtwrrn

@grtwrrn, sorry you ran into this issue. Could you provide the sample code and the dataset so that I can repro the issue at my end? Which version of dice-ml are you running? The latest is 0.7.1. If you could try with 0.7.1 and see if you still continue to run into the issue.

Regards,

gaugup avatar Sep 13 '21 16:09 gaugup

Hi @gaugup, thanks for your response! I've updated to 0.7.1 now but I'm still running into the same issue. I'm not sure if I have permission to share the dataset so I won't here, but this is my code (k_value = 1): image image image

As you can see, DICE works fine for the first two cases, but loops infinitely on the third. This happens throughout the test set – some work but the majority of the ones I've tried seem to loop. I'm wondering if the issue is related to the one in #46 ? The continuous features in my dataset are already z-scored, so the changes may be quite small. I'm using the model-agnostic method rather than the gradient-based one however.

EDIT: I've just run the same code, this time with the raw data instead of the scaled data, and it seems to work fine, so I'm guessing that was the cause. However, for the sake of standardising across different methods, I want to use my preprocessed, z-scored data. Is there a way of doing this? Thanks again!

grtwrrn avatar Sep 14 '21 16:09 grtwrrn

It's unclear why CF generation is successful for the raw data but not the z-scored ones. Is it possible to share a minimum working example so we can debug? Perhaps a simulated data close to yours?

amit-sharma avatar Oct 10 '21 11:10 amit-sharma

Hi, I get the same problem too. Here is a synthetic data I am generating :

Synthetic data

SIZE=100 loc=0.0 scale= 0.5 n, p = 10, .5 a=[0, 1, 2] np.random.seed(seed=0)

x1=np.random.randint(2,size=SIZE) x2=np.random.randint(2,size=SIZE) x3=np.random.normal(loc=loc, scale=scale, size=SIZE) x4= np.random.choice(a=a, size=SIZE) x5= (np.logical_xor(x1, x2)).astype(int) x6= (np.logical_not(x2)).astype(int) x7= np.random.binomial(n, p, size=SIZE) x8= np.sin(x7/2.)

y=x1-x2+x3-x4+x5-x6+x7-x8

df=pd.DataFrame() df['x1']=x1 df['x2']=x2 df['x3']=x3 df['x4']=x4 df['x5']=x5 df['x6']=x6 df['x7']=x7 df['x8']=x8 cut=np.mean(y) df['y']=np.where(y>cut,1,0)

DiCE

  d = dice_ml.Data(dataframe=df, continuous_features=list(x_test.columns), outcome_name=output)
  m = dice_ml.Model(model=model, backend="sklearn")

  exp = Dice(d, m, method="random")
  query_instance = x_test
  e1 = exp.generate_counterfactuals(query_instance, total_CFs=10, desired_range=None,
                                    desired_class="opposite",
                                    permitted_range=None, features_to_vary="all")

  imp = exp.local_feature_importance(query_instance, posthoc_sparsity_param=None)
  dicecontrib=pd.DataFrame.from_dict(imp.local_importance)

I am running this code for different values of loc, scale and p.

loc = arange(0.0, 5.0, 0.1) scale = arange(0.0, 1.0, 0.1) p = arange(0.0, 1.0, 0.1)

cwayad avatar Oct 27 '22 21:10 cwayad

@cwayad are you using the latest version of dice (v0.9)?

also, to reproduce your example, I need the model training code. can you provide that so that this can be debugged?

amit-sharma avatar Oct 30 '22 15:10 amit-sharma

@amit-sharma Yes I am using the latest version (v0.9). I use a random forest model: x_train, y_train, x_test, y_test= train_test_split(df, feature_to_predict, n=0.8, random_s=25) model = RandomForestClassifier(n_estimators = 10) model.fit(x_train, y_train)

Please note that I run this code in three for loops on loc, scale and p. so it may work for some combination of loc, scale and p and not for others.

I already test it with some datasets like "cervical cancer" and it worked but it doesn't work for this specific synthetic dataset.

cwayad avatar Oct 30 '22 17:10 cwayad

Hello @amit-sharma . This issue still persists. DiCE gets stuck infinitely for some instances with Genetic Method, or not counterfactuals are found for other methods like Random or KD-Tree. Is there some way which can determine which instances can get stuck this way? Thank you.

PMK1991 avatar Sep 14 '23 14:09 PMK1991

Hello Amit,

Even i am facing similar issue in latest version 0.10. too.Is there any solution for this?.

baji-loreal avatar Sep 22 '23 13:09 baji-loreal