Varying adaptive weighting function
Hi! I am trying to apply causal forest to a panel data setting. I have two questions in bold.
Is there a way to modify the adaptive weighting function ($\alpha_i(x)$ from equation 2 of Athey et. al 2019) such that it varies by categorical attributes in the data? The reason is that since I'm interested in estimating the CATE for $Y_{i}$ (relative to the pre-treatment period $g$) in multiple time periods ($Y_{i, s} - Y_{i,g}, Y_{i, s+1} - Y_{g},...Y_{i, s+k} - Y_{g}$), and I want to ensure that when the CATE for $\delta_{s-g}(x)$, one of my parameters of interest is calculated, that only observations from where $t = s$, not $t = s+1$ have non-zero weights (according to the adaptive weighting function).
This is to ensure that when comparing the treatment and control, I'm only comparing observations with the same $t-g$ value.
Please let me know if anything is unclear! I had originally thought of using a separate causal forest for each $Y_{i, t} - Y_{i,g}$ for $t \in [s, \cdots, s+k]$ but since I want to cluster SEs at the individual level inference is difficult under that procedure. Is there a way to combine each iteration of a bootstrapped CATE from multiple forests (assuming bootstrap seeds are the same in all forests) for the purposes of inference?
Hi @liaochris, the causal forest setup is a cross section of covariates. If you had only two time periods you could collapse this to a standard cross section by looking at the first-differenced outcomes. It is typically easier to try and translate your problem to a standard cross section than to modify the underlying weight function, which would involve creating a custom grf forest from scratch. This paper might be useful: https://arxiv.org/abs/1905.11622 (there's also some applied work in various outlets applying causal forests to first-differenced outcomes).
Thanks @erikcs , the reference is helpful. I have a follow-up question based off your answer. Suppose we want to estimate conditional average treatment effects (CATEs) in an event–study setting.
Let $Y_{i,t}$ be the outcome of unit $i$ at time $t$, and let the treatment occur at event time $k=0$. Define the event–time relative outcome as
$$ \tilde{Y}{i,k} = Y{i,t_i + k} - Y_{i,t_i - 1}, $$
where $t_i$ is the treatment time for unit $i$. Thus, all outcomes are expressed relative to the pre-treatment period $k=-1$.
Suppose I am interested in the CATEs at different post-treatment horizons $k = 1, 2, 3, \ldots$, i.e.
$$ \tau_k(x) = \mathbb{E}[\tilde{Y}{i,k}(1) - \tilde{Y}{i,k}(0) \mid X_i = x], $$
where $X_i$ are covariates.
Which of the following two approaches are recommended?
1. Single forest with stacked outcomes
- Plug all differenced outcomes $\tilde{Y}_{i,k}$ for multiple horizons as outcomes when estimating one causal forest.
- Calculate CATEs conditional on both $X_i$ and $k$, where $k$ is event time.
$$ \hat{\tau}(x, k) = \mathbb{E}[\tilde{Y}{i,k}(1) - \tilde{Y}{i,k}(0) \mid X_i = x, K=k]. $$
2. Separate forests by horizon
- For each horizon $k \in {1,2,3,\ldots}$, run a separate causal forest with outcome $\tilde{Y}_{i,k}$.
- Obtain horizon-specific CATEs:
$$ \hat{\tau}k(x) = \mathbb{E}[\tilde{Y}{i,k}(1) - \tilde{Y}_{i,k}(0) \mid X_i = x]. $$