ciTools icon indicating copy to clipboard operation
ciTools copied to clipboard

add_ci.lmer chokes on "big data"

Open jthaman opened this issue 7 years ago • 0 comments

I'm finding that we cannot use add_ci.lmer for "big data". I tried an example from the mermod vignette with 200,000 observations and found that R couldn't put the new data frame into memory. Here's the example I tried:

## linear example

x_gen_mermod <- function(ng = 8, nw = 5){
  n <- ng * nw
  x2 <- runif(n)
  group <- rep(as.character(1:ng), each = nw)
  return(tibble::tibble(x2 = x2,
                        group = group))
}

mm_pipe <- function(tb, ...){
  model.matrix(data = tb, ...)
}

get_validation_set <- function(tb, sigma, sigmaG, beta, includeRanef, groupIntercepts){
  vm <- sample_n(tb, 5, replace = F)[rep(1:5, each = 100), ]
  vf <- bind_rows(vm, tb) %>%
    select(-group) %>%
    mm_pipe(~.*.)
  vf <- vf[1:500, ]
  vGroups <- if(!includeRanef) rnorm(500, 0, sigmaG) else groupIntercepts[as.numeric(vm$group)]
  vm[["y"]] <- vf %*% beta + vGroups + rnorm(500, mean = 0, sd = sigma)
  vm
}

y_gen_mermod <- function(tb, sigma = 1, sigmaG = 1, delta = 1, includeRanef = FALSE, validationPoints = FALSE){
  groupIntercepts <- rnorm(length(unique(tb$group)), 0, sigmaG)
  tf <- tb %>%
    dplyr::select(-group) %>%
    mm_pipe(~.*.)
  beta <- rep(delta, ncol(tf))
  if(validationPoints)  {
    vm <- get_validation_set(tb, sigma, sigmaG, beta, includeRanef, groupIntercepts)
  }
  tb[["y"]] <- tf %*% beta + groupIntercepts[as.numeric(tb$group)] + rnorm(nrow(tb), mean = 0, sd = sigma)
  tb[["truth"]] <- tf %*% beta + groupIntercepts[as.numeric(tb$group)] * (includeRanef)
  if(validationPoints) return(list(tb = tb, vm = vm)) else return(tb)
}


tb <- x_gen_mermod(10, 20000) %>%
    y_gen_mermod()

fit2 <- lmer(y ~ x2 + (1|group) , data = tb)

tb %>% add_ci(fit2, type = "parametric", includeRanef = TRUE, names = c("LCB", "UCB"))

Lmer works just fine on an example data set this large, but ciTools chokes and spits out

Error: cannot allocate vector of size 298.0 Gb

We need to re-examine how we are storing things in memory and see if we can do something more efficient. I'm not sure if this bug affects the other methods as well.

jthaman avatar Mar 09 '18 17:03 jthaman