AIF360 Report: memory issue

When I repeat running 'for i in range(100000): a =BinaryLabelDatasetMetric()', the memory will continuously increase. How to fixed it.

Feb 12 '21 09:02 jinyilun718

related to #74 and #85?

Feb 12 '21 17:02 nrkarthikeyan

@nrkarthikeyan Really appreciate your help!

Feb 15 '21 04:02 jinyilun718

@nrkarthikeyan I adopted the method you recommended but still resulting in large memory consumption in the repeated running. I found that the problem caused by function ' ClassificationMetric()', has possible solution schemes? Looking forward to your helpful reply.

Feb 15 '21 09:02 jinyilun718

I've solved the problem. The problem was caused by the code, self.dataset = dataset self.classified_dataset = classified_dataset Just use these codes to replace the original initial codes: ` def init(self, X_train,Y_train,pred ,cloums_name,protected_value, unprivileged_groups=None, privileged_groups=None):

    X_train =X_train.reset_index(drop=True)
    Y_train = Y_train.reset_index(drop=True)
    pred = pred.reset_index(drop=True)
    
    x=pd.concat([X_train,Y_train], axis=1)
    dataset=BinaryLabelDataset(df = x,label_names = list(Y_train.columns) ,
                protected_attribute_names=[list(X_train.columns)[cloums_name]],
                unprivileged_protected_attributes=protected_value,
                privileged_protected_attributes=abs(1-protected_value))
    del x
    x=pd.concat([X_train,pred], axis=1)
    
    
    classified_dataset =BinaryLabelDataset(df = x,label_names = list(Y_train.columns) ,
                protected_attribute_names=[list(X_train.columns)[cloums_name]],
                unprivileged_protected_attributes=protected_value,
                privileged_protected_attributes=abs(1-protected_value))
    del x

    self.classified_dataset = classified_dataset
    self.dataset = dataset
    self.unprivileged_groups=None
    self.privileged_groups=None

    if isinstance(classified_dataset, BinaryLabelDataset) or isinstance(classified_dataset, MulticlassLabelDataset) :
        self.classified_dataset = classified_dataset
    else:
        raise TypeError("'classified_dataset' should be a "
                        "BinaryLabelDataset or a MulticlassLabelDataset.")

    with self.dataset.temporarily_ignore('labels', 'scores'):
        if self.dataset != self.classified_dataset:
            raise ValueError("The two datasets are expected to differ only "
                             "in 'labels' or 'scores'.")`

Feb 15 '21 11:02 jinyilun718