clarify icon indicating copy to clipboard operation
clarify copied to clipboard

sim_ame() not working inside a loop

Open brianfeld-esd opened this issue 1 year ago • 3 comments

I am new with R so I may not be good at explaining the issue I am facing. I am using clarify to estimate the average treatment effect on the treated after matching. I am estimating the impact of a program on several outcomes, which is why I am using clarify inside a loop.

I am following the steps outlined here: https://iqss.github.io/clarify/articles/Zelig.html#estimating-the-att-after-matching

  1. I extract the matched dataset and fit a model outside clarify.
  2. I simulate the model coefficients using clarify::sim().
  3. I use sim_ame()] to request the average marginal effect of treat within the subset of treated units.

Whenever the outcome variable I am using has any NAs, I get the following error in step 3:

Error in clarify::sim_ame(): ! No finite estimates were produced.

However, if I follow these steps with the same outcome variable outside a loop, I do not get any error.

I can paste the exact code I am using, but unfortunately, I cannot include any data since my agency prohibits it. If you could help me understand what is happening, I would greatly appreciate it.

brianfeld-esd avatar Mar 07 '24 15:03 brianfeld-esd

I need to see your code and the error message you get because I have no problems when running a loop over outcomes when one of them has missing data. See my code below for example.

library("clarify")
data("lalonde", package = "MatchIt")
set.seed(100)

#Add outcome with missingness
lalonde$re78_2 <- lalonde$re78
is.na(lalonde$re78_2)[sample(1:614, 50, TRUE)] <- TRUE

m <- MatchIt::matchit(treat ~ age + educ + race + married + re74,
                      data = lalonde, method = "quick")

md <- MatchIt::match.data(m)

f <- "~ treat * (age + educ + race + married + re74)"

est <- lapply(c("re78", "re78_2"), function(i) {
  fi <- as.formula(paste(i, f))
  fit <- lm(fi, data = md, weights = weights)
  
  s <- sim(fit, vcov = ~subclass, n = 200)
  summary(sim_ame(s, var = "treat", subset = treat == 1,
                  contrast = "diff", verbose = FALSE))
})

est
#> [[1]]
#>         Estimate 2.5 % 97.5 %
#> E[Y(0)]     4934  3828   6337
#> E[Y(1)]     6349  5351   7222
#> Diff        1416  -359   3048
#> 
#> [[2]]
#>         Estimate 2.5 % 97.5 %
#> E[Y(0)]     4900  3602   6169
#> E[Y(1)]     6455  5315   7663
#> Diff        1555  -270   3548

Created on 2024-03-08 with reprex v2.1.0

Note that it is statistically invalid to have missing outcomes in your analysis; those observations are dropped which means any balance you achieved by matching is likely destroyed in the sample that remains. But let's focus on figuring out the bug.

ngreifer avatar Mar 08 '24 20:03 ngreifer

The code I used is below, but as I said, I cannot include the data since it is confidential. I have issues when estimating the ATT for the variables "earnings_1Qt_post_enroll" up to "hourly_wage_4Qt_post_enroll", which are missing for those who are not employed. Instead all the variables that end with "_unc" are not conditional on employment, so the NAs are replaced by zeroes.

col_names <- c("employed_1Qt_post_enroll", "employed_2Qt_post_enroll", "employed_3Qt_post_enroll",
               "employed_4Qt_post_enroll", "earnings_1Qt_post_enroll", "earnings_2Qt_post_enroll",
               "earnings_3Qt_post_enroll", "earnings_4Qt_post_enroll", "hours_1Qt_post_enroll",
               "hours_2Qt_post_enroll", "hours_3Qt_post_enroll", "hours_4Qt_post_enroll",
               "hourly_wage_1Qt_post_enroll", "hourly_wage_2Qt_post_enroll", "hourly_wage_3Qt_post_enroll",
               "hourly_wage_4Qt_post_enroll", 
               "earnings_1Qt_post_enroll_unc", "earnings_2Qt_post_enroll_unc",
               "earnings_3Qt_post_enroll_unc", "earnings_4Qt_post_enroll_unc", "hours_1Qt_post_enroll_unc",
               "hours_2Qt_post_enroll_unc", "hours_3Qt_post_enroll_unc", "hours_4Qt_post_enroll_unc",
               "hourly_wage_1Qt_post_enroll_unc", "hourly_wage_2Qt_post_enroll_unc", 
               "hourly_wage_3Qt_post_enroll_unc", "hourly_wage_4Qt_post_enroll_unc", 
               "career_services", "service_total")

for (name in col_names) {
  
  model_var <- paste0("model_",name)
  est_var <- paste0("est_",name)
  sum_var <- paste0("sum_", name)

  print(model_var)
  
  assign(model_var,lm(formula = get(name) ~ secsa + male + age_at_enrollment + white2 + black2 +
                                    asian2 + aian2 + nhpi2 + #two_more_races + race_unknown +
                                    hispanic + educ_yrs + english_proficiency + veteran +
                                    homeless + disability_status + previous_enrollment +
                                    coenrolled + employed_2Qt_pre_enroll + employed_1Qt_pre_enroll +
                                    employed_0Qt_pre_enroll, data = matching_sample, weights = weights))
  
  assign(model_var, clarify::sim(get(model_var), vcov = ~subclass))

  assign(est_var, clarify::sim_ame(get(model_var), var = "secsa", subset = secsa == 1,
                                             contrast = "diff", verbose = FALSE))
    
  assign(sum_var, summary(get(est_var), null = 0))
  
}

Thanks, Brian

brianfeld-esd avatar Mar 08 '24 20:03 brianfeld-esd

Hi Brian,

I am so sorry but I can't figure out what is going on. In all my tests with missing values in the outcome, I cannot induce that error. Can you include the matchit() code you used as well? And perhaps any summary of the data that might help me figure out if there is something special about the variables you are using? Let's just restrict it to two ofd your outcomes variables since you said you were having problems with at least the first two. Show me their summary() in the matched.data set. Thanks.

If we can't figure it out, you should probably just use marginaleffects as recommended in the MatchIt documentation. For linear models it is actually better to use marginaleffects than clarify.

ngreifer avatar Mar 08 '24 21:03 ngreifer