grf icon indicating copy to clipboard operation
grf copied to clipboard

Varying adaptive weighting function

Open liaochris opened this issue 1 year ago • 1 comments

Hi! I am trying to apply causal forest to a panel data setting. I have two questions in bold.

Is there a way to modify the adaptive weighting function ($\alpha_i(x)$ from equation 2 of Athey et. al 2019) such that it varies by categorical attributes in the data? The reason is that since I'm interested in estimating the CATE for $Y_{i}$ (relative to the pre-treatment period $g$) in multiple time periods ($Y_{i, s} - Y_{i,g}, Y_{i, s+1} - Y_{g},...Y_{i, s+k} - Y_{g}$), and I want to ensure that when the CATE for $\delta_{s-g}(x)$, one of my parameters of interest is calculated, that only observations from where $t = s$, not $t = s+1$ have non-zero weights (according to the adaptive weighting function).

This is to ensure that when comparing the treatment and control, I'm only comparing observations with the same $t-g$ value.

Please let me know if anything is unclear! I had originally thought of using a separate causal forest for each $Y_{i, t} - Y_{i,g}$ for $t \in [s, \cdots, s+k]$ but since I want to cluster SEs at the individual level inference is difficult under that procedure. Is there a way to combine each iteration of a bootstrapped CATE from multiple forests (assuming bootstrap seeds are the same in all forests) for the purposes of inference?

liaochris avatar Jan 18 '25 16:01 liaochris

Hi @liaochris, the causal forest setup is a cross section of covariates. If you had only two time periods you could collapse this to a standard cross section by looking at the first-differenced outcomes. It is typically easier to try and translate your problem to a standard cross section than to modify the underlying weight function, which would involve creating a custom grf forest from scratch. This paper might be useful: https://arxiv.org/abs/1905.11622 (there's also some applied work in various outlets applying causal forests to first-differenced outcomes).

erikcs avatar Mar 10 '25 10:03 erikcs

Thanks @erikcs , the reference is helpful. I have a follow-up question based off your answer. Suppose we want to estimate conditional average treatment effects (CATEs) in an event–study setting.

Let $Y_{i,t}$ be the outcome of unit $i$ at time $t$, and let the treatment occur at event time $k=0$. Define the event–time relative outcome as

$$ \tilde{Y}{i,k} = Y{i,t_i + k} - Y_{i,t_i - 1}, $$

where $t_i$ is the treatment time for unit $i$. Thus, all outcomes are expressed relative to the pre-treatment period $k=-1$.

Suppose I am interested in the CATEs at different post-treatment horizons $k = 1, 2, 3, \ldots$, i.e.

$$ \tau_k(x) = \mathbb{E}[\tilde{Y}{i,k}(1) - \tilde{Y}{i,k}(0) \mid X_i = x], $$

where $X_i$ are covariates.


Which of the following two approaches are recommended?

1. Single forest with stacked outcomes

  • Plug all differenced outcomes $\tilde{Y}_{i,k}$ for multiple horizons as outcomes when estimating one causal forest.
  • Calculate CATEs conditional on both $X_i$ and $k$, where $k$ is event time.

$$ \hat{\tau}(x, k) = \mathbb{E}[\tilde{Y}{i,k}(1) - \tilde{Y}{i,k}(0) \mid X_i = x, K=k]. $$

2. Separate forests by horizon

  • For each horizon $k \in {1,2,3,\ldots}$, run a separate causal forest with outcome $\tilde{Y}_{i,k}$.
  • Obtain horizon-specific CATEs:

$$ \hat{\tau}k(x) = \mathbb{E}[\tilde{Y}{i,k}(1) - \tilde{Y}_{i,k}(0) \mid X_i = x]. $$

liaochris avatar Sep 10 '25 00:09 liaochris