docs Von mises distribution support

Summary:

I went to rewrite the docs for von_mises to define its support as (-pi, pi) and add a note that things outside that range are handled periodically.

Then I realized von_mises_rng generates random numbers on the range (mod(mu, 2 pi) - pi, mod(mu, 2 pi) + pi), which doesn't really make that much sense even if you allow the support to move around.

I'd assume either (-pi, pi) or (mu - pi, mu + pi) would have been the two obvious choices.

The support of the rng just seems wrong. Even if we allow von_mises_lpdf to accept numbers out of its support, I think we should fix the rng to work only on (-pi, pi). This will also lead to changes in the math library if we decide to do this. Just putting the discussion here.

https://mc-stan.org/docs/2_22/functions-reference/von-mises-distribution.html

@bob-carpenter @pgree

Current Version:

v2.18.0

Apr 16 '20 17:04 bbbales2

I agree that the RNG should be changed to generate in the (-pi, pi) range. This isn't even breaking backward compatibility, it's just fixing something that's broken.

The question is then whether we take mu modulo pi?

This is a whole new kind of centering!

Apr 16 '20 18:04 bob-carpenter

I'd assume either (-pi, pi) or (mu - pi, mu + pi) would have been the two obvious choices.

The question is then whether we take mu modulo pi?

I would vote for the rng to generate random numbers on (mu - pi, mu + pi). As far as I can tell, this seems most consistent with how the pdf will be used.

Apr 20 '20 14:04 pgree

That might make sense if mu is fixed. But in posterior predictive quantities, mu will vary across posterior draws, and thus so will the generated range of (mu - pi, mu + pi), which could lead to RNGs in a greater than 2 * pi range.

How is the CDF going to be defined? Does that just run -pi to pi?

Apr 20 '20 14:04 bob-carpenter

Yeah that'd what I'd like.

Apr 20 '20 14:04 bbbales2

That might make sense if mu is fixed. But in posterior predictive quantities, mu will vary across posterior draws, and thus so will the generated range of (mu - pi, mu + pi), which could lead to RNGs in a greater than 2 * pi range.

How is the CDF going to be defined? Does that just run -pi to pi?

The ambiguity here is related to the fact that the von Mises distribution over the circle does not have a well defined probability density function, nor does it have a well-defined cumulative distribution function.

The von Mises probability density function as typically presented is the local density function within a local chart. On a circle no chart is able to cover the entire space — there’s always one point missing between the interval endpoints. For example the range of the von Mises density in Stan is not the inclusive, closed interval [-pi, pi] but rather the exclusive, open interval (-pi, pi).

Consequently in order to define a local von Mises density function one first has to define a chart, which is why sources like Wikipedia will say confusing things like “the domain of the von Mises distribution is any interval on the circle”.

Methods that rely on only point estimates often take advantage of this freedom to continuously change the chart, and implicitly the definition of the density function, to center the chart around the center of the density such that the density in the excluded point is negligible. The current RNG is probably motivated by such algorithms.

One can’t change the chart between iterations in MCMC, however, and recover well-defined MCMC estimators. Consequently MCMC requires fixing a chart which is done implicitly in the current von Mises implementation.

That said, most of the functions one might take an expectation of within a chart do not correspond to well-defined expectations on the actual circle. In particular there is no unique mean or median or other quantiles because the circle can’t be ordered. The points within a chart can be ordered which admits locally-defined moments and quantiles, but because the chart is arbitrary so too is the ordering, and so too are these expectations.

For consistency one can fix the RNG to match the fixed chart implied by the current von Mises (local) probability density function implementation, but ultimately the ambiguity arises because there really shouldn’t be a von Mises density function in Stan given how MCMC estimation is currently assumed to be on the real numbers or the integers and not on topologically nontrivial spaces like circles.

Apr 30 '20 15:04 betanalpha

Thanks, @betanalpha, that's really helpful.

My understanding of @bbbales2's suggestion is that we just fix the domain to be (-pi, pi). Doesn't that fix what you're calling an "ordering" and resolve the definitions of means, medians, and CDFs?

The closest we get to a circular topology is the unit_vector type. It's overparameterized, but the projections down to the circle should be identified.

What problems remain?

Apr 30 '20 16:04 bob-carpenter

My understanding of @bbbales2 https://github.com/bbbales2's suggestion is that we just fix the domain to be (-pi, pi). Doesn't that fix what you're calling an "ordering" and resolve the definitions of means, medians, and CDFs.

If introduces an ordering but an arbitrary one. Consequently all of the resulting structure won’t have anything to do with the von Mises distribution on the circle.

One way of seeing this is to recognize that any choice of domain (-pi + delta, pi, + delta) defines a valid ordering (formally it’s just defining a chart) and means, medians, CDFs, etc, within the context of that chart. When you compare any of those objects between different choices however, you get conflicting results. This conflict means that the objects are not inherently defined by the distribution but rather by the choice of arbitrary interval.

Equivalently you can think about rotating the circle and the von Mises distribution along with it before considering the interval (-pi, pi). The rotation don’t change the distribution but they do change the CDF, density, moments, quantiles, etc, within that interval. If choosing a coordinate system changes the answer then you’re not getting a well-defined answer.

This is a general problem with manifolds like circles, torii, and the like. For circles the problem is considered in the field of directional statistics. See for example https://en.wikipedia.org/wiki/Directional_statistics#Moments https://en.wikipedia.org/wiki/Directional_statistics#Moments for one way to define moments on circles using complex numbers.

The closest we get to a circular topology is the unit_vector type. It's overparameterized, but the projections down to the circle should be identified.

The overparameterization indicates that the unit_vector is not modeling a sphere but rather the embedding of a sphere S^{N} into R^{N + 1}. The same ambiguities arise here as above because of the ambiguity in how there sphere can be rotated before its embedded. Consequently moments defined in the embedding space are not inherent to the sphere but rather defined by the particular embedding that was chosen.

Concepts like moments and quantiles on non-Euclidean spaces are really subtle and definitely outside of the scope of Stan’s current approach to MCMC estimation. Technically HMC can be modified to work with these spaces but we’d have to change the back end to handle varying charts and the I/O and estimator construction to handle coordinating these charts relative to each other and well-defining the functions whose expectation values are estimated.

Apr 30 '20 16:04 betanalpha

If choosing a coordinate system changes the answer then you’re not getting a well-defined answer.

Here it seems like the answers are all well-defined, just not rotationally invariant.

Technically HMC can be modified to work with these spaces

I've seen the spherical case, and assume other ones would look similar.

the unit_vector is not modeling a sphere but rather the embedding of a sphere S^{N} into R^{N + 1}.

Nevertheless, the implied distribution in the unconstrained space is proper and the transform to constrained space yields a uniform distribution over unit vectors. What more do we need? I'm afraid I don't know what "inherent to the sphere" means here. I'll take a look at the directional statistics page.

@pgree Do you have an opinion on any of this? Sorry to throw you into a landmine on this one! I don't think the CDF for von Mises is that high priority and I think @betanalpha's arguing that we shouldn't be trying to do it at all.

Apr 30 '20 16:04 bob-carpenter

I think a lot of these choices depend heavily on how the von Mises distribution would be used by Stan users. You all have a much better sense of this than I do.

Here it seems like the answers are all well-defined, just not rotationally invariant.

I agree. As long as it’s clear to the user which branch cut is being used, then I don’t think there should be a problem. If you want to sample from the von Mises distribution, then you can’t do better than fixing a branch cut and sampling on that branch cut. All samples correspond to a unique point on the circle and all points on the circle correspond to a unique point on that branch cut.

Apr 30 '20 22:04 pgree

Here it seems like the answers are all well-defined, just not rotationally invariant.

They are not.

Firstly nothing is rotationally invariant if the distribution is not rotationally invariant. The problem is that the mapping of the circle into the real line introduces an ambiguity. Because CDFs, densities, means, quantiles, etc are defined on the real line they too are plagued by that ambiguity.

For example consider the mapping into (-pi, pi) and define the mean as the expectation of the identify function in this image. If the von Mises distribution concentrates around points near 0 in this mapping then the this mean will concentrate around 0 as one might expect. But if the von Mises distribution concentrates on the other side of the circle then the density will concentrate towards -pi and pi and the mean will be 0 once again, not the expected value near the boundaries. Change the mapping to (0, 2pi) and you now everything flips.

This is not a controversial opinion. The mean is defined as the expectation value of an embedding of the sample space into the real line. When the embedding is unique (identify function for reals into reals, isometric embedding for integers into reals) then the mean, and higher moments, are well-defined. Then the embedding is not unique, as the case for non-Euclidean spaces, then the mean is no longer well-defined until additional constraints are imposed, as is done for the E[ z = e^{i \theta} ] in directional statistics.

the unit_vector is not modeling a sphere but rather the embedding of a sphere S^{N} into R^{N + 1}.

Nevertheless, the implied distribution in the unconstrained space is proper and the transform to constrained space yields a uniform distribution over unit vectors.

Not really. The pushforward/marginalization of the joint distribution on R^{N + 1} along the mapping from R^{N + 1} to S^{N} is the uniform distribution over the sphere, and that happens only implicitly. At no point do we actually parameterize the sphere directly (because that is mathematically impossible).

Moreover, the problem of ill-defined moments, CDFs, etc remains.

May 01 '20 01:05 betanalpha

I agree. As long as it’s clear to the user which branch cut is being used, then I don’t think there should be a problem. If you want to sample from the von Mises distribution, then you can’t do better than fixing a branch cut and sampling on that branch cut.

Not branch cuts but charts. No complex analysis here.

All samples correspond to a unique point on the circle and all points on the circle correspond to a unique point on that branch cut.

Samples will fall within a chart with probability 1, but averages of the chart coordinates will not correspond to the MCMC estimator of well-defined expectation values.

May 01 '20 01:05 betanalpha

The pushforward/marginalization of the joint distribution on R^{N + 1} along the mapping from R^{N + 1} to S^{N} is the uniform distribution over the sphere, and that happens only implicitly. At no point do we actually parameterize the sphere directly

I understand that's what's going on. But what do we lose in not parameterizing the sphere directly?

May 01 '20 02:05 bob-carpenter

@betanalpha --- thanks for continuing to explain things here. I'm afraid it's a little abstract for me. I understand that we can choose any interval we want, but I don't see what's wrong with fixing a single interval and going with it. More specifically, what's going wrong in everyone's model that's using our von Mises distribution now?

May 01 '20 02:05 bob-carpenter

@betanalpha https://github.com/betanalpha --- thanks for continuing to explain things here. I'm afraid it's a little abstract for me. I understand that we can choose any interval we want, but I don't see what's wrong with fixing a single interval and going with it. More specifically, what's going wrong in everyone's model that's using our von Mises distribution now?

The MCMC estimators don’t correspond to any meaningful expectation values. Stan will run fine and produce default estimators but they won’t mean anything mathematically. Not a great outcome.

May 01 '20 02:05 betanalpha

I'm not sure what you mean by "meaningful" here. It sounds to me like fixing an interval defines them in a mathematical sense. So they have a value. Why isn't that meaningful? I'm really missing the thread of the argument here. If we allow the circle (or sphere or hypersphere) to rotate, then means don't make any sense at all and there's no way to locate anything. I don't even see how we could connect data.

May 01 '20 14:05 bob-carpenter