Add synthetic control as analysis method
As described by @david26694 we want to: "Add a wrapper to a synthetic control implementation that gives p-values, this should allow us to treat synthetic control as just another analysis method and check if it has higher power than simpler things"
This involved the creation of a new analysis class called SyntheticControlAnalysis, similar to the other types of analysis.
However, in order to perform the analysis correctly, we need the pre experiment data as part of the fit to find the weights (fit_synthetic). In the implementation in main, PowerAnalysis class accepts a pre_experiment param, however this is only used for cuped purposes (where we add a column to df). Therefore, I had to create a new class called PowerAnalysisWithPreExperimentData, where the pre experiment df is also available.
Furthermore, I had to create another splitter called PredefinedTreatmentClustersSplitter, as we want only one cluster to be assigned as treatment and the rest as control. This was done to simplify the logic and to be more consistent with the usual application of synthetic control
This will not be implemented in this PR
- Allow power analysis with more than 1 treatment cluster
- Run it from power config
- Graphs on synthetics and donors
- Parallel execution for p value calculation
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Codecov Report
Attention: Patch coverage is 95.14563% with 5 lines in your changes missing coverage. Please review.
Project coverage is 96.77%. Comparing base (
d5a4977) to head (e0b4c1c). Report is 15 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| cluster_experiments/experiment_analysis.py | 93.22% | 4 Missing :warning: |
| cluster_experiments/power_analysis.py | 92.85% | 1 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## main #168 +/- ##
==========================================
- Coverage 96.93% 96.77% -0.17%
==========================================
Files 9 10 +1
Lines 1078 1179 +101
==========================================
+ Hits 1045 1141 +96
- Misses 33 38 +5
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
in the notebook, I think it'd be cool to compare power lines and point estimate distrubtions of clusteredOLS and synthetic control
(last suggestions and we merge)