`normal` logpdf should return -Inf when variance is out of bounds
This would allow Gaussian drift proposals on variance—they may propose negative values, and it is more helpful to reject the proposed samples than to give an error.
What is the mathematical meaning of Gaussian drift proposal on a variance? The prior and proposal would have different support in this case.
It's all right for the proposal of a Metropolis-Hastings kernel to have broader support than the posterior; these samples are just rejected with probability 1.
@alex-lew I agree this would be helpful, but it seems important to have a systematic approach to deciding whether to return an error or to return -Inf. It seems strange to treat certain invalid arguments as -Inf and others as errors. Also, we should understand how such a policy would generalize to generative functions, e.g. by allowing an "invalid trace" to be returned by generate or update, with associated -Inf weight.
One approach that doesn't involve defining a general policy for distinguishing between invalid arguments that should result in errors and invalid arguments that should result in -Inf weights/scores would be for every generative function and distribution to throw a custom exception whenever the arguments are invalid (for whatever reason -- wrong type, out of bounds, etc.) and then to modify the MH procedure and other inference procedures to catch these and handle them however they see fit. MH could always reject when it catches this exception.
This leaves open the option for writing "safer" inference procedures (including e.g. extending MH with a flag to decide whether to error vs reject) that are able to be more stringent if that's what the user wants by not catching the exception.
@alex-lew, is that the same thing as saying "if this change is made, then using normal as a proposal distribution on a variance will really have semantics of a truncated normal proposal via rejection sampling"?
On a separate thread, I think
integral from -infty to infty of logpdf(normal, x, mu, sigma) dx [EDIT: should have said pdf, not logpdf]
should not take a non-1 value when sigma is negative.
@alex-lew, is that the same thing as saying "if this change is made, then using
normalas a proposal distribution on a variance will really have semantics of a truncated normal proposal via rejection sampling"?
No, they're different procedures. Rejecting a sample repeats the old sample in your Markov chain; a truncated normal proposal is an asymmetric proposal with a different MH accept/reject ratio.
integral from -infty to infty of logpdf(normal, x, mu, sigma) dx
Do you mean logpdf or pdf? If pdf, the proposed change would make this integral 0.
One approach that doesn't involve defining a general policy for distinguishing between invalid arguments that should result in errors and invalid arguments that should result in -Inf weights/scores would be for every generative function and distribution to throw a custom exception whenever the arguments are invalid (for whatever reason -- wrong type, out of bounds, etc.) and then to modify the MH procedure and other inference procedures to catch these and handle them however they see fit. MH could always reject when it catches this exception.
@marcoct Yes, this approach also works! And seems nicer to make it configurable.
Do you mean logpdf or pdf? If pdf, the proposed change would make this integral 0.
Whoops yes, I meant pdf. Same comment applies.
No, they're different procedures. Rejecting a sample repeats the old sample in your Markov chain; a truncated normal proposal is an asymmetric proposal with a different MH accept/reject ratio.
Right ok, so why would the former algorithm (rejection version) be correct?
No, they're different procedures. Rejecting a sample repeats the old sample in your Markov chain; a truncated normal proposal is an asymmetric proposal with a different MH accept/reject ratio.
Right ok, so why would the latter algorithm be correct?
I'm not sure I follow -- both algorithms are correct; they are Metropolis-Hastings algorithms with different proposals (a Gaussian, or a truncated Gaussian). As with MH in general, when you choose a different proposal, you get a different algorithm, but they both target the same posterior.
In the "truncated Gaussian" case, when you 'reject' a negative sample, the rejection is 'silent' and not recorded in the chain. This means that on the surface, you are using a 'better' algorithm, in that the samples may be less autocorrelated (fewer repeated values). However, your accept/reject ratio must take into account the asymmetry of the proposal distribution.
In the "reject negatives" case, when you reject a negative sample, it's a real rejection, and the old value of x is repeated in the chain. If you want to, e.g., estimate an expectation by averaging your samples, you need to include the various repeated values of x, which is not the case with the truncated proposal. But the upside is that you have a simpler accept/reject probability, which is just min(1, p(x')/p(x)) (where the numerator may be 0).
Do you mean logpdf or pdf? If pdf, the proposed change would make this integral 0.
Whoops yes, I meant pdf. Same comment applies.
Ah, I missed the not :-)
Mathematically, I think it makes sense to define normal(mu, sigma) to be the zero measure when sigma < 0. If the integral were 1, I'd expect us to have defined some other semantics for what distribution normal(mu, sigma) denoted in this case, and I think any other choice of 'default' is a bit strange. FWIW, it's common in PPL semantics to treat program terms that error, e.g. by dividing by 0, as denoting the zero measure. Then programs that might error denote subprobability measures [you are creating a mixture distribution where one component is a zero measure]. Nontermination is sometimes handled the same way.
But I think Marco is right that a "Gen-ic" solution would be to make this configurable by the inference program, e.g. using exceptions.
FWIW, logpdf for a number of other distributions currently return -Inf for values that are out of support (e.g. categorical, beta, and others discussed in #203 and #206). As noted in #203, most distributions which are backed by Distributions.jl currently return -Inf for values outside the support. This discussion is slightly different though, since it seems like we're talking about the parameters being out-of-bounds, rather than the values being out-of-bounds, though of course the two are related.
I don't have a strong opinion on whether to use exceptions to handle out-of-bounds parameters, or to return -Inf, but I just wanted to make sure we treating it as separate from the related issue of out-of-bounds values. If we do go the exception route, we should probably try to support exception handling within importance_sampling and SMC as well.
@marcoct @ztangent I agree they're definitely separate issues, but it's worth noting that they raise similar questions in terms of what they mean for the GFI. That is, we already have a convention of returning 'impossible' traces with -Inf scores sometimes, from generate and update.
One question is whether we should have a ZeroTrace or ImpossibleTrace or similar that GFI methods could return instead of having to create a full trace. (It could contain custom debugging information.) This would allow generate, update, etc. to return early when they first encounter a zero-probability choice. Such an optimization could also help us avoid solving the "negative variance" question (at least for now): we'd first evaluate the proposed negative variance under its prior, which would return -Inf, and then we could return an ImpossibleTrace without needing to run the rest of the program.
This would also ensure that users could think of the 'possible code paths' through their GF as being those that can actually be sampled, rather than also having to consider what happens when impossible values are proposed.
But I think Marco is right that a "Gen-ic" solution would be to make this configurable by the inference program, e.g. using exceptions.
My intuition is that making this configurable would become very confusing. I don't have a super-cogent explanation of this intuition, but to caricature, imagine if Julia allowed users to toggle a global flag that determined whether arrays were 0-indexed or 1-indexed (or 3-indexed, ...). That would wreak havoc for any library that expects the flag to be set the wrong way.
ImpossibleTrace
Hm this is interesting! Related but not exactly the same, I've recently been thinking a bit about whether there's any hope of distinguishing between "doubly impossible" traces, etc., such that two impossible traces with different multiplicities can still be compared for which one is more likely (the one with the lower impossibility level).
but to caricature, imagine if Julia allowed users to toggle a global flag that determined whether arrays were 0-indexed or 1-indexed (or 3-indexed, ...). That would wreak havoc for any library that expects the flag to be set the wrong way.
For a library that's intending to be low-level and extensible and usable in a variety of different types of workflows, I think it's important to be able to turn on or off these types of stringency checks. Note that this isn't an option that changes between two different program behaviors that may be hard to detect like with the indexing example - it's an option that switches between two very specific behaviors -- crashing and non-crashing, which is also a standard motif in interface design. Also, I'm not suggesting that there should be a global (and therefore less discoverable) Gen option for this -- it would e.g. be a flag that gets passed to inference primitives like Gen.mh.
Related but not exactly the same, I've recently been thinking a bit about whether there's any hope of distinguishing between "doubly impossible" traces, etc., such that two impossible traces with different multiplicities can still be compared for which one is more likely (the one with the lower impossibility level).
What is the use case you have in mind for this?
I think Vikash has mentioned coding up this "different multiplicities of impossible"; I think the goal was to let you use MH to go from a "very impossible" initial state to a possible one, in stages? (E.g. imagine trying to sample uniformly from the solutions to a SAT problem, using proposals that flip one or more binary variables. Fewer violated clauses is better, even though any violated clauses is "impossible.")
My motivation was roughly, thinking about whether particle filtering is completely sunk as soon as the particles have weight -Inf, or whether there is a way to "look locally" at regions of the address space where the Gen.projects of the particle traces are finite and still have some reasonable notion of resampling the particles. Not that I'd necessarily want or need to do that -- was just wondering if people in the lab had thought about this. Certainly it seems that NaNs arising as Inf / Inf are a very common source of issues in e.g. TensorFlow.