alibi icon indicating copy to clipboard operation
alibi copied to clipboard

Partial dependence plots

Open RobertSamoilescu opened this issue 3 years ago • 8 comments

Implementation of the partial dependence (PD) and individual conditional expectation (ICE) leveragingsklearn implementation. Some functionalities that it includes

  • PD and ICE for numerical features
  • PD and ICE for categorical features
  • PD and ICE for combinations of numerical and/or categorical features
  • Plots for all the above cases
  • Custom grids
  • Usage of any black-box model (i.e. not only restricted to sklearn estimators)

TODOs:

  • [x] Method description notebook
  • [x] Example usage notebook

RobertSamoilescu avatar Jul 21 '22 13:07 RobertSamoilescu

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@RobertSamoilescu I've attempted to fix the problem with the docs CI failing. It was due to the $f_S(x_{S})$ equation in PartialDependence.ipynb. There were a number of minor issues:

  • The use of $$ surrounding the \begin{align}. Our combination of nbsphinx and myst-parser means amsmath environments such as align are automatically interpreted as latex math. If you also add $$ the html docs build will be OK, but it will screw with our latex docs build.
  • The use of |...| for abs. Sphinx uses |...| for some sort of substitution functionality (see here), so with |S| it was treating S as a substitution. Solution is to use \lvert and \rvert instead.
  • Intends have a specific meaning to myst-parser. We need to make sure we don't indent the content in math environments.

With the above changes I think the equation should render correctly for html and pdf builds, but probably good to check!

ascillitoe avatar Jul 27 '22 09:07 ascillitoe

P.s. It seems there is still an issue with the hyperlink format I advised you to use. I'll look into this now.

ascillitoe avatar Jul 27 '22 09:07 ascillitoe

P.s. It seems there is still an issue with the hyperlink format I advised you to use. I'll look into this now.

@RobertSamoilescu you are doing everything correct here. i.e. creating a header anchor in overview/high-level.md:

(partial-dependence)=
#### Partial Dependence

(note you don't have to do this for heading levels 1 to 3 since our config setting myst_heading_anchors = 3 means anchors are generated automatically for these). You should be able to reference this like you are doing in PartialDependence.ipynb:

[Partial Dependence](../overview/high_level.md#partial-dependence) 

But... I'm afraid this will not work due to our combination of nbsphinx for parsing .ipynb files and then myst-parser for doing the final rendering. Long story short myst-parser looks for ../overview/high_level.md#partial-dependence, but nbsphinx has already converted it to ../overview/high_level.html#partial-dependence. This would be fixed if we transition from nbsphinx to myst-nb one day, or better yet stop writing methods docs as jupyter-notebooks (my strong preference!).

ascillitoe avatar Jul 27 '22 10:07 ascillitoe

I agree with moving towards .md format with myst syntax entirely for docs. Would need a boring PR that replaces all the .ipynb docs files with the .md + myst equivalents but not sure if the effort at the moment is justified.

jklaise avatar Jul 27 '22 13:07 jklaise

@RobertSamoilescu btw are the 3rd and 4th files here redundant? image

jklaise avatar Aug 03 '22 16:08 jklaise

@RobertSamoilescu should we try to add a tqdm progress bar around the for loop over features_list? Can't remember if there were some issues in certain Python environments with this...

jklaise avatar Aug 03 '22 16:08 jklaise

@RobertSamoilescu I think it would be great if we could have another example, preferably classification to show off interpretation of multiple targets and black-box to show that it works the same way as sklearn models. This could be a follow-up PR.

jklaise avatar Aug 04 '22 08:08 jklaise

Codecov Report

Merging #721 (79d2c43) into master (7c5e48c) will decrease coverage by 1.33%. The diff coverage is 60.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #721      +/-   ##
==========================================
- Coverage   80.99%   79.65%   -1.34%     
==========================================
  Files         105      107       +2     
  Lines       11869    12657     +788     
==========================================
+ Hits         9613    10082     +469     
- Misses       2256     2575     +319     
Impacted Files Coverage Δ
alibi/utils/visualization.py 18.98% <14.28%> (-5.55%) :arrow_down:
alibi/explainers/partial_dependence.py 45.88% <45.88%> (ø)
alibi/explainers/tests/conftest.py 92.30% <78.57%> (-1.19%) :arrow_down:
alibi/api/defaults.py 100.00% <100.00%> (ø)
alibi/explainers/__init__.py 100.00% <100.00%> (ø)
alibi/explainers/tests/test_partial_dependence.py 100.00% <100.00%> (ø)
alibi/datasets/default.py 69.56% <0.00%> (-1.03%) :arrow_down:
alibi/explainers/anchors/anchor_tabular.py 89.57% <0.00%> (-0.71%) :arrow_down:

codecov[bot] avatar Aug 22 '22 17:08 codecov[bot]

After an offline discussion we decided to copy the private methods from sklearn to compute PD using brute approach and use that directly on black-box models without wrapping into a sklearn-like wrapper. This allows us to get rid of the confusing predictor_kw kwarg making it easier and more transparent to the end-user. It also has the added benefit of us being more in control due to not relying on private sklearn functions.

On the flip side, this means for now we only support numpy arrays for both black-box and sklearn models (contrast with sklearn implementation which allows estimators to be fitted on pandas dataframes, sparse matrices etc. This is in line with our current approach of mainly supporting numpy array fitted models.

jklaise avatar Sep 02 '22 08:09 jklaise

Decided to remove the options response_method='auto'. This is because for a binary classifier the number of output targets can change based on the other parameters (e.g., method='recursive' and kind='average' vs method='brute' and kind='both'). If method='recursive' and kind='average' are set, then the decision function is used to compute the PD, which means that for a binary classifier the output will have just one column. On the other hand, if method='brute' and kind='both', then predict_proba will be used instead, which in our implementation will result in two output columns. The problem arises when plotting since the plots can use the target_names specified in the constuctor. For example, if target_names=['output'], then a plotting error will arise when calling the explain method with the second pair of parameters since in that case the output has 2 columns, but the user specified only one target name. Another failure can arrise if target_names=[class_0, class_1]. If the first pair of parameters is used, then the PD corresponding to the decison score will be labeled with class_0' which is not correct.

Also I decided to move the parameter response_method into the __init__ since will fully specify the function to be used for the PD computation. Thus, we will avoid the plotting errors above. If the user wants to use anther function, they will have to create a new explainer object.

RobertSamoilescu avatar Sep 02 '22 19:09 RobertSamoilescu

Also decided to remove method='auto' option. The sklearn logic is the following:

if method == Method.AUTO:
    if isinstance(self.predictor, BaseGradientBoosting) and self.predictor.init is None:
        method = Method.RECURSION.value
    elif isinstance(self.predictor, (BaseHistGradientBoosting, DecisionTreeRegressor, RandomForestRegressor)):
        method = Method.RECURSION.value
    else:
        method = Method.BRUTE.value

if method == Method.RECURSION:
    if not isinstance(self.predictor, (BaseGradientBoosting, BaseHistGradientBoosting, DecisionTreeRegressor,
                                       RandomForestRegressor)):
        supported_classes_recursion = (
            "GradientBoostingClassifier",
            "GradientBoostingRegressor",
            "HistGradientBoostingClassifier",
            "HistGradientBoostingRegressor",
            "HistGradientBoostingRegressor",
            "DecisionTreeRegressor",
            "RandomForestRegressor",
        )
        raise ValueError(f"Only the following estimators support the 'recursion' "
                         f"method: {supported_classes_recursion}. Try using method='{Method.BRUTE.value}'.")

    if response_method == ResponseMethod.AUTO:
        response_method = ResponseMethod.DECISION_FUNCTION.value

    if response_method != ResponseMethod.DECISION_FUNCTION:
        raise ValueError(f"With the '{method.RECURSION.value}' method, the response_method must be "
                         f"'{response_method.DECISION_FUNCTION.value}'. Got {response_method}.")

Removing the method='auto' option is a consequence of removing response_method='auto'.

Consider we have a model that supports both decision_function and predict_proba . Furthermore, assume it is a model that supports recursion method. Then we can have at least the following cases:

  1. kind='average', response_method='decision_function', method='auto' In this case, method will become method='recursion' because the model supports recursion option (first set of if). Everything works well.

  2. kind='average', response_method='predict_proba', method='auto' In this case, method will also become method=recursion because the model supports recursion (first set of if). But after that, a value error will be thrown because the response_method cannot be set or was not set to decision_function. The error would not have been thrown if response_method='auto' cause in that case response_method would have become 'decision_function'. But since that option is no longer available, IMO it doesn't make sense to keep the option method='auto'.

RobertSamoilescu avatar Sep 02 '22 20:09 RobertSamoilescu

Following an offline discussion it was decided to simplify the implementation and user interface by splitting the implementation into two distinct classes:

  • PartialDependence for use with black-box models calculating PD using a brute-force approach
  • TreePartialDependence for use with white-box models (currently only a small selection of sklearn estimators) that support a recursive algorithm for calculating PD which is faster than the brute-force approach

This allows us to remove all of the slightly confusing arguments discussed previously.

Also, @RobertSamoilescu checked that the recursive algorithm returns slightly different values than performing the brute-force PD on the same estimators which further justifies splitting the implementation into two public classes (similar to KernelShap and TreeShap.

jklaise avatar Sep 08 '22 15:09 jklaise

@jklaise, check the note at the end of this section which confirms that the two methods differ in the values they return

RobertSamoilescu avatar Sep 08 '22 15:09 RobertSamoilescu