problems using loo in clogit models
In a rstanarm branch, I introduced a stan_clogit function which is similar to the clogit function in the survival package but is actually described in more detail by Stata ( http://www.stata.com/manuals13/rclogit.pdf ). So, you will be able to do
post <- stan_clogit(case ~ spontaneous + induced, strata = stratum,
data = infert[order(infert$stratum, !infert$case),], QR = TRUE)
The problem is that some groups have the same values on the dummy variables spontaneous and induced, such as
> infert[infert$stratum == 3,]
education age parity induced case spontaneous stratum pooled.stratum
3 0-5yrs 39 6 2 1 0 3 4
86 0-5yrs 39 6 2 0 0 3 4
168 0-5yrs 39 6 2 0 0 3 4
Thus, no matter what are the posterior realizations of the coefficients on spontaneous and induced, the likelihood for group 3 is the same.
I am pretty sure the correct concept for a clogit model is to imagine leaving out one group rather than one observation, but when I call loo::loo.matrix, the third column of the input is a constant, which causes the Pareto k estimate to be infinite. Since the third group could be omitted and only change the log-likelihood by a constant, this seems to be unreasonable.
The question becomes, what function should be catching this? We could have stan_clogit drop groups that have only one unique row in the design matri{x,ces}. We could have loo.stanreg omit such groups. Or loo::loo could check which Pareto k estimates are infinite and change them to some number when the log likelihoods are finite but constant. Thoughts @avehtari ?
I have been waiting this to happen.
loo should check (almost) constant columns, skip Pareto fit, and return equal weights.
Isn't it possible to get almost constant columns because of problems with model/sampling? In that case how would loo know if it was ok to ignore (e.g. clogit case) or if it's an actual problem?
On Wed, Mar 29, 2017 at 3:05 PM Aki Vehtari [email protected] wrote:
I have been waiting this to happen.
loo should check (almost) constant columns, skip Pareto fit, and return equal weights.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290193206, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q6yom_e_0NZkN1iFaiIHMW1G6g9iks5rqqtygaJpZM4MtXmA .
I think if the chains never move, then the n_eff will be low and the user should not be calling loo() in the first place.
On Wed, Mar 29, 2017 at 3:50 PM, Jonah Gabry [email protected] wrote:
Isn't it possible to get almost constant columns because of problems with model/sampling? In that case how would loo know if it was ok to ignore (e.g. clogit case) or if it's an actual problem?
On Wed, Mar 29, 2017 at 3:05 PM Aki Vehtari [email protected] wrote:
I have been waiting this to happen.
loo should check (almost) constant columns, skip Pareto fit, and return equal weights.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290193206, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHb4Q6yom_e_ 0NZkN1iFaiIHMW1G6g9iks5rqqtygaJpZM4MtXmA>
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290204939, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqnyzHr975_DCJ-Vl9NoQvcQn70Saks5rqrX3gaJpZM4MtXmA .
True. And that would trigger warnings from rstanarm.
On Wed, Mar 29, 2017 at 4:13 PM bgoodri [email protected] wrote:
I think if the chains never move, then the n_eff will be low and the user should not be calling loo() in the first place.
On Wed, Mar 29, 2017 at 3:50 PM, Jonah Gabry [email protected] wrote:
Isn't it possible to get almost constant columns because of problems with model/sampling? In that case how would loo know if it was ok to ignore (e.g. clogit case) or if it's an actual problem?
On Wed, Mar 29, 2017 at 3:05 PM Aki Vehtari [email protected] wrote:
I have been waiting this to happen.
loo should check (almost) constant columns, skip Pareto fit, and return equal weights.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290193206, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHb4Q6yom_e_ 0NZkN1iFaiIHMW1G6g9iks5rqqtygaJpZM4MtXmA>
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290204939, or mute the thread < https://github.com/notifications/unsubscribe-auth/ADOrqnyzHr975_DCJ-Vl9NoQvcQn70Saks5rqrX3gaJpZM4MtXmA
.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290211442, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q6AVyc1jk1WQjnsFpG0MFwu_9K1uks5rqrtlgaJpZM4MtXmA .
Another thing is that in clogit and some other models it would be better to leave one group out, which requires a bit of additional thinking how to implement that.
We have a workaround for this in the clogit case in a branch of rstanarm but it would be better to fix this for the next loo release.
Is the only thing that needs to be fixed the false positive Pareto k warning?
That is, is the only change to skip constant columns when doing the Pareto smoothing?
Yeah, what Aki said.
Ok that should be pretty straightforward to take care of
On Fri, Aug 4, 2017 at 12:30 AM bgoodri [email protected] wrote:
Yeah, what Aki said.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-320153619, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4QxoAqmk4Rm7XychHBoct6By7Qtryks5sUp5mgaJpZM4MtXmA .