Causal trees update
Proposed changes
This PR is mainly about causal trees support.
- The architecture of a causal tree implementation was moved to a more modular approach:
-
BaseCausalDecisionTreeinherits everything is needed from scikit-learnBaseDecisionTreeand modifiesfit()method that stores only appropriate checks for causal trees. -
CausalTreeRegressornow hasRegressorMixinandBaseCausalDecisionTreeparent classes which makes it fully compatible with scikit-learn. - Split criterion was moved to a separate
criterion.pyxwhereCausalRegressionCriterioninherits methods from scikit-learnRegressionCriterionand implementsnode_value()to save the average of treatment effect for each node.CausalMSEnow is a concrete class with impurity computations for causal trees. I also addedStandardMSEconcrete class which is actually standard MSE criterion from scikit-learn with modifiednode_value()method. So, now it is easy to add new criteria and see the influence of each criteria on a causal tree fit .
- Details about causal trees:
- ATE bootstrap confidence intervals calculation in
CausalTreeRegressornow has multiprocessing support. - Now you can plot
CausalTreeRegressorwith standard scikit-learn function. - For a deeper research
CausalTreeRegressorcan calculate the number of treatment and control observations in each leaf,_leaves_groups_cntlow-level attribute. Additionally,plot_dist_tree_leaves_valuesfunction gives the distribution of ATE in a tree leaves. -
CausalRandomForestRegressorbased on scikit-learn withCausalTreeRegressorasbase_estimator. - Method
calculate_errorinCausalRandomForestRegressorcalculates unbiased sampling variance. Source. - New Jupyter notebook
causal_trees_with_synthetic_data.ipynbwithCausalTreeRegressorandCausalRandomForestRegressormodels.
- Tests:
- Additional tests:
test_causal_trees.py -
Makefilecontains install, build, test, clean. Now you can simply typemake test. Cython code compilation is under the hood. -
setup()function insetup.pynow knows aboutrequirements-test.txtdependencies thanks totests_requireparameter. No need to install them manually.
Types of changes
What types of changes does your code introduce to CausalML?
Put an x in the boxes that apply
- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [x] Documentation Update (if none of the other choices apply)
Checklist
Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.
- [x] I have read the CONTRIBUTING doc
- [x] I have signed the CLA
- [x] Lint and unit tests pass locally with my changes
- [x] I have added tests that prove my fix is effective or that my feature works
- [x] I have added necessary documentation (if appropriate)
- [ ] Any dependent changes have been merged and published in downstream modules
Further comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc. This PR template is adopted from appium.
Thanks @alexander-pv for your much needed contribution. We will review the PR soon.
Hi, @alexander-pv,
Sorry for my late review. I thought that I left comments earlier but turned out that I didn't submit them and my review was somehow left pending. Code looks good and thanks for the sample notebook and test code which are very comprehensive. One ask is: let's remove min_impurity_split from the code so that we can use latest scikit-learn. Currently it raises "TypeError: init() got an unexpected keyword argument 'min_impurity_split'".
Thanks!
Hi, @jeongyoonlee ,
Thanks for the review, I pushed necessary changes.
I faced the thing that min_impurity_split removal in latest scikit-learn breaks tree building. It turned out that new version of DepthFirstTreeBuilder prevents causal tree from growing new nodes.
See differences: old condition vs new condition.
Since EPSILON constant in original cython file is widely used and it can't be adjusted, I suggest creating builder.pyx with DepthFirstCausalTreeBuilder as DepthFirstTreeBuilder modification (see latest commits). This file can also be an example how to create custom tree builders via subclassing scikit-learn TreeBuilder.
Hi @alexander-pv, Py 3.7 test failed with the error as follows: https://github.com/uber/causalml/runs/7923168250?check_suite_focus=true#step:6:492
Could you please check? If it's something that needs more time, we can merge this PR first and investigate it in a separate thread.
Hi, @jeongyoonlee, It seems that now everything is fixed :ok_hand:.
Merged! 👍🏻