causalml icon indicating copy to clipboard operation
causalml copied to clipboard

Causal trees update

Open alexander-pv opened this issue 3 years ago • 3 comments

Proposed changes

This PR is mainly about causal trees support.

  • The architecture of a causal tree implementation was moved to a more modular approach:
  1. BaseCausalDecisionTree inherits everything is needed from scikit-learn BaseDecisionTree and modifies fit() method that stores only appropriate checks for causal trees.
  2. CausalTreeRegressor now has RegressorMixin and BaseCausalDecisionTree parent classes which makes it fully compatible with scikit-learn.
  3. Split criterion was moved to a separate criterion.pyx where CausalRegressionCriterion inherits methods from scikit-learn RegressionCriterion and implements node_value() to save the average of treatment effect for each node. CausalMSE now is a concrete class with impurity computations for causal trees. I also added StandardMSE concrete class which is actually standard MSE criterion from scikit-learn with modified node_value() method. So, now it is easy to add new criteria and see the influence of each criteria on a causal tree fit .
  • Details about causal trees:
  1. ATE bootstrap confidence intervals calculation in CausalTreeRegressor now has multiprocessing support.
  2. Now you can plot CausalTreeRegressor with standard scikit-learn function.
  3. For a deeper research CausalTreeRegressor can calculate the number of treatment and control observations in each leaf, _leaves_groups_cnt low-level attribute. Additionally, plot_dist_tree_leaves_values function gives the distribution of ATE in a tree leaves.
  4. CausalRandomForestRegressor based on scikit-learn with CausalTreeRegressor as base_estimator.
  5. Method calculate_error in CausalRandomForestRegressor calculates unbiased sampling variance. Source.
  6. New Jupyter notebook causal_trees_with_synthetic_data.ipynb with CausalTreeRegressor and CausalRandomForestRegressor models.
  • Tests:
  1. Additional tests: test_causal_trees.py
  2. Makefile contains install, build, test, clean. Now you can simply type make test. Cython code compilation is under the hood.
  3. setup() function in setup.py now knows about requirements-test.txt dependencies thanks to tests_require parameter. No need to install them manually.

Types of changes

What types of changes does your code introduce to CausalML? Put an x in the boxes that apply

  • [ ] Bugfix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [x] Documentation Update (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • [x] I have read the CONTRIBUTING doc
  • [x] I have signed the CLA
  • [x] Lint and unit tests pass locally with my changes
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [x] I have added necessary documentation (if appropriate)
  • [ ] Any dependent changes have been merged and published in downstream modules

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc. This PR template is adopted from appium.

alexander-pv avatar Jun 24 '22 00:06 alexander-pv

Thanks @alexander-pv for your much needed contribution. We will review the PR soon.

jeongyoonlee avatar Jun 24 '22 00:06 jeongyoonlee

Hi, @alexander-pv,

Sorry for my late review. I thought that I left comments earlier but turned out that I didn't submit them and my review was somehow left pending. Code looks good and thanks for the sample notebook and test code which are very comprehensive. One ask is: let's remove min_impurity_split from the code so that we can use latest scikit-learn. Currently it raises "TypeError: init() got an unexpected keyword argument 'min_impurity_split'".

Thanks!

jeongyoonlee avatar Aug 12 '22 18:08 jeongyoonlee

Hi, @jeongyoonlee ,

Thanks for the review, I pushed necessary changes.

I faced the thing that min_impurity_split removal in latest scikit-learn breaks tree building. It turned out that new version of DepthFirstTreeBuilder prevents causal tree from growing new nodes. See differences: old condition vs new condition. Since EPSILON constant in original cython file is widely used and it can't be adjusted, I suggest creating builder.pyx with DepthFirstCausalTreeBuilder as DepthFirstTreeBuilder modification (see latest commits). This file can also be an example how to create custom tree builders via subclassing scikit-learn TreeBuilder.

alexander-pv avatar Aug 13 '22 20:08 alexander-pv

Hi @alexander-pv, Py 3.7 test failed with the error as follows: https://github.com/uber/causalml/runs/7923168250?check_suite_focus=true#step:6:492

Could you please check? If it's something that needs more time, we can merge this PR first and investigate it in a separate thread.

jeongyoonlee avatar Aug 19 '22 18:08 jeongyoonlee

Hi, @jeongyoonlee, It seems that now everything is fixed :ok_hand:.

alexander-pv avatar Aug 21 '22 00:08 alexander-pv

Merged! 👍🏻

jeongyoonlee avatar Aug 21 '22 00:08 jeongyoonlee