posterior icon indicating copy to clipboard operation
posterior copied to clipboard

HDI?

Open ASKurz opened this issue 4 years ago • 7 comments

Would it be in the scope of posterior to provide support for HDI's in contrast with quantile-based intervals? For example, say I wanted to quickly summarize my draws by their modes and 95% intervals.

example_draws() %>% 
  summarise_draws(mode = Mode, ~ quantile(.x, probs = c(.025, .975)))
# A tibble: 10 × 4
   variable  mode  `2.5%` `97.5%`
   <chr>    <dbl>   <dbl>   <dbl>
 1 mu        5.91  -2.16     10.2
 2 tau       1.74   0.174    14.6
 3 theta[1]  5.67  -2.81     22.7
 4 theta[2]  5.44  -3.24     14.9
 5 theta[3]  5.52 -15.0      13.2
 6 theta[4]  5.94  -4.64     14.8
 7 theta[5]  4.26  -8.68     12.3
 8 theta[6]  3.20  -6.80     12.8
 9 theta[7]  4.46  -2.57     18.1
10 theta[8]  4.22  -6.53     14.8

There's great support for percentile-based intervals. Is there support for HDI's?

ASKurz avatar Nov 19 '21 02:11 ASKurz

There is no support for HDIs yet but it might be a reasonable addition. The problem is the HDI rely on density estimates which require many more draws in order to be reliable enough.

paul-buerkner avatar Jan 05 '22 09:01 paul-buerkner

Great! I acknowledge the limitations of HDIs, but it would be very nice if they were an option.

ASKurz avatar Jan 05 '22 19:01 ASKurz

I'd be curious if anyone out there knows anything about the literature on HDI estimation methods? In {ggdist} I've been using the two methods from {HDInterval} and the more I look at them the less satisfied I am. Even with a large sample (> 10,000) there still seems to be a lot of variance. I think if we're going to implement something here it would be worth a bit of a dive into the literature to pick the best method first (then I can also update ggdist to use that method via posterior).

mjskay avatar Jan 05 '22 22:01 mjskay

https://twitter.com/SolomonKurz/status/1478864812394397696

ASKurz avatar Jan 06 '22 00:01 ASKurz

Even with a large sample (> 10,000) there still seems to be a lot of variance.

I would have assumed there is not much more variance than for quantiles, and that the most of the big variance comes from using quite extreme quantiles.

Andrew et al has a paper "Simulation-efficient Shortest Probability Intervals" http://www.stat.columbia.edu/~gelman/research/published/spin.pdf and CRAN R package SPIn https://cran.r-project.org/web/packages/SPIn/index.html SPIn reduces the variability a bit, and the paper illustrates the challenge of reducing the variance without additional assumptions about the target distribution. After the paper had been published, Andrew started to think that HDI is not that useful, and thus hasn't advocated the paper and the package. In that paper the focus was on 95% HDI, and since then Andrew has advocated, e.g. 20% and 80% quantiles as they have less variance.

Anyway, as the code for SPIn is available and based on quite well made paper, it would be "easy" to add. SPIn package has been last updated in 2013, so maybe it would better to just copy the relevant code to the posterior package?

avehtari avatar Mar 21 '22 17:03 avehtari

FWIW, the bayestestR package has a recent PR that adds support for the posterior package (https://github.com/easystats/bayestestR/pull/518), i.e. all functions like hdi(), eti(), point_estimate() etc. now work on objects of class draws. There's probably some overlap between functions in bayestestR and posterior, but the scope of the packages is different and they rather complement, I'd say.

strengejacke avatar Apr 02 '22 07:04 strengejacke

The bayestestR package documentation for hdi() doesn't tell which algorithm is used for HDI. Anyway, I think it's great if bayestestR will support posterior objects.

avehtari avatar Apr 03 '22 15:04 avehtari