loo icon indicating copy to clipboard operation
loo copied to clipboard

Error with `ppc_loo_interval`: missing value where TRUE/FALSE needed

Open jburos opened this issue 8 years ago • 3 comments

Getting the following error when running ppc_loo_interval on a model estimated using CmdStan. For context, I've read the samples into an R stanfit object, and both y_rep and log_lik were included in the generated quantities block (unfortunately can't share the model nor the data).

Here is the command:

# estimate psisloo, plot loo_intervals
ppmat <- rstan::extract(fit,  'y_rep')[['y_rep']]
lw <- try(loo::extract_log_lik(fit, parameter_name = 'log_lik'))
psis <- loo::psislw(lw)
bayesplot::ppc_loo_intervals(y = true_vals, yrep = ppmat, lw = psis$lw_smooth)

Here is the error:

Error in if (all(w == w[1])) return(quantile(x, probs = probs, names = FALSE)) :
  missing value where TRUE/FALSE needed

Seems to be due to this being in the input:

>  any(is.na(psis$lw_smooth))
[1] TRUE
> which(is.na(psis$lw_smooth))
 [1]   6831   9815  10118  12118  13815  14879  16879  18879  20831  35594
[11]  37594  39594  46118  62521  78118  80831  83248  85248  88521  90521
[21]  97594 100831 103594 105594 114831 116803 122521 152831 162780 164831
[31] 166831 175594 176095 197594 237553 239553 255594 256010

However, this is not derived from the log-lik input:

> any(is.na(lw))
[1] FALSE

And, I don't see anything odd about these log-lik values vs the others:

> summary(lw[!is.na(psis$lw_smooth)])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 -59634   -8560   -6848   -9261   -5771   -3354
> summary(lw[is.na(psis$lw_smooth)])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 -19224   -4574   -4278   -4658   -3701   -3417

These are not clustered within certain observations:

> apply(psis$lw_smooth, FUN = function(x) sum(is.na(x)), MARGIN = 2)
  [1] 0 0 0 1 1 1 2 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
 [38] 0 0 1 1 1 1 0 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
 [75] 0 0 1 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
[112] 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0

Seems these are caused by values of psis$lw_smooth that are -Inf

Here is the relevant debug context:

debugging in: E_fun(x[, i], w[, i], probs)
debug: {
    stopifnot(all(probs > 0 & probs < 1))
    if (all(w == w[1]))
        return(quantile(x, probs = probs, names = FALSE))
    ord <- order(x)
    x <- x[ord]
    w <- w[ord]
    ww <- cumsum(w)
    ww <- ww/ww[length(ww)]
    qq <- numeric(length(probs))
    for (j in seq_along(probs)) {
        ids <- which(ww >= probs[j])
        wi <- min(ids)
        if (wi == 1) {
            qq[j] <- x[1]
        }
        else {
            w1 <- ww[wi - 1]
            x1 <- x[wi - 1]
            qq[j] <- x1 + (x[wi] - x1) * (probs[j] - w1)/(ww[wi] -
                w1)
        }
    }
    return(qq)
}
Browse[2]> all(w == w[1])
[1] NA
Browse[2]> any(is.na(w))
[1] TRUE
Browse[2]> w[is.na(w)]
[1] NaN

Looking within the loo::psis function:

Browse[2]>
debug: if (cores == 1) {
    out <- lapply(X = 1:N, FUN = .psis_loop)
} else {
    if (.Platform$OS.type != "windows") {
        out <- mclapply(X = 1:N, FUN = .psis_loop, mc.cores = cores)
    }
    else {
        cl <- makePSOCKcluster(cores)
        on.exit(stopCluster(cl))
        out <- parLapply(cl, X = 1:N, fun = .psis_loop)
    }
}
Browse[2]>
debug: out <- mclapply(X = 1:N, FUN = .psis_loop, mc.cores = cores)
Browse[2]> str(out[[128]]$lw_new)
 num [1:2000] -Inf -Inf -Inf -Inf -Inf ...

For now, I'm pretty sure this is a terrible model fit (the loo-checks are part of an evaluation pipeline), but posting here in case it's helpful for others.

jburos avatar Oct 11 '17 19:10 jburos

Can you share just the log_lik values?

avehtari avatar Oct 17 '17 19:10 avehtari

Yep. I just need to find the one that yielded the error.

jburos avatar Oct 17 '17 21:10 jburos

Here is an Rds containing the log-lik values from one of the fits yielding that error. I have found several examples -- not sure which is the one I was inspecting above. LMK if it would be helpful to see others.

As I mentioned, these tend to sample terribly poorly; I've since (mostly) fixed the model. Nonetheless may be useful to track down the issue. Thanks for taking a look.

jburos avatar Oct 19 '17 20:10 jburos