Use `⊙` instead of `*` for creating weighted measures?
@keorn pointed out that in Distributions, * and + behave like this:
julia> 3 * Dists.Normal()
Distributions.LocationScale{Float64, Distributions.Continuous, Distributions.Normal{Float64}}(
μ: 0.0
σ: 3.0
ρ: Distributions.Normal{Float64}(μ=0.0, σ=1.0)
)
julia> 3 + Dists.Normal()
Distributions.LocationScale{Float64, Distributions.Continuous, Distributions.Normal{Float64}}(
μ: 3.0
σ: 1.0
ρ: Distributions.Normal{Float64}(μ=0.0, σ=1.0)
)
This is very different from MeasureTheory, where
julia> density(3 * Normal(), 2.4) / density(Normal(), 2.4)
3.0
This issue is to consider making some changes to this, to minimize confusion for those coming from Distributions.
We currently allow ⊙ for a "likelihood operating on a measure". We could potentially consider a scalar to work in a similar way, almost like a likelihood that always returns the given value.
Notes / Concerns
Currently for any constant k and measure μ we have
density(k * μ) = k * density(μ)
Under this change, this would become
density(k ⊙ μ) = k * density(μ)
Despite its common use in Distributions, it's a little strange from a type perspective to expect this to work. It feels a little like having a function f and wanting k * f to return a new function x -> k * f(x).
I would say that an additional argument against using * for WeightedMeasure, is that the operation of scaling density is not commonly performed using * operator. Having this operator, which is very common in numerical computing, be overloaded to perform this somewhat niche operation can cause confusion when reading code. I could see someone else being equally confused about its semantics here as we are about the semantics in Distributions.jl.
Yep, I agree. In general, I just want to be sure to think through the semantics and any potential implications.
If we think of "lifting" the constants, we can treat them as constant (log-)likelihoods. So k ⊙ μ would give a new measure with
density(k ⊙ μ) = k * density(μ)
logdensity(k ⊙ μ) = log(k) + logdensity(μ)
Then * and + would give affine transformations. All of that seems fine.
I think the natural next question is, what if k is an array? If we allow this for scalars, the analogous thing for arrays seems natural.
Also, between two measures μ and ν, we currently have μ * ν for the product measure, and μ + ν for the superposition. So for example, k * μ would be very different from Dirac(k) * μ. But this is already very different, it would just be different in a different way :)
Yeah, other usages of * and + is another can of worms. At PlantingSpace we prefer ⊗ for product measure and ⊕ for what you call superposition to make things more distinct.
Also I do not think there is a natural extension to arrays - array currently does not have measure theoretic semantics besides just being a possible support.
I think we do not want to conflate *-multiplication of random variables c*X with * multiplication of densities, so we perhaps won't follow distributions.jl. But to avoid confusion I think you are right that ⊙ makes sense.
At PlantingSpace we prefer
⊗for product measure and⊕for what you call superposition to make things more distinct.
I've considered this. I think product measure and superposition are the category-theoretic product and coproduct, in which case ⊗ and ⊕ make a lot of sense. The biggest concern I can see is that people also use ⊗ for kronecker product.
The use for the Kronecker seems unproblematic given that both uses can be thought of as instances of a tensor product.
@mschauer what do you think of transitioning to use ⊗ and ⊕ in this way? Maybe we should talk to the Catlab folks about getting a core interface with some categorical primitives. From a categorical perspective it's pretty standardized, so getting common ground should be easy. That would help avoid future name collisions.
https://julialang.zulipchat.com/#narrow/stream/230248-catlab.2Ejl/topic/Category.20interface.20package.3F
I'm not yet too familiar with this package but I feel that scalar * measure should keep its current meaning, the most natural in the context of general measures. Whatever choice you make, there will be confusion to some users. Why not choose the path most consistent with the "measures" vision? This is what I would expect from the name of the package, anyhow.
I agree ⊗ makes sense for product measures, but prefer + for adding measures (as in (m1 + m2)(A) = m1(A) + m2(A)). Is this what is meant by "superposition" above? Sorry I'm not familiar with the term in context of measures.
Probably I misunderstand, but I would have thought the coproduct of two measure spaces is their disjoint union, which would make the coproduct of two measures different from their sum in the sense just mentioned. For the coproduct measure on a disjoint union, ⊕ makes sense to me.
When I started this package, it was to address some aspects of the Distributions design that were making things difficult for my work in Soss. So certainly there's no inherent requirement to follow that design in any way.
When Distributions uses + and * as a + b * Normal(), it's described as an affine transform of a random variable, but of course Normal() isn't a random variable. So in reality, it's lifting these operations, roughly like the pseudo-notation
a::Real + d::Distribution = (a + x for x in d)
This kind of silent conversion is very non-Julian, but unfortunately I think that ship has sailed. I've suggested this should really be written as broadcasting, but there seem to be at least implicit assumptions that broadcasting is over a finite set of values.
Anyway, I haven't had much luck influencing the design of Distributions, so I try not to spend much time on it. At the same time, we want people to use the package, and most new users will already be familiar with Distributions. And certainly any conflicts with Distributions should have good reasons behind them. It can be very difficult to strike the right balance.
In some cases, a compromise can be in the best interest of the package and its future users. I think this has been the case for DensityInterface.jl. Here, the challenge has been to allow an in-road for users without measure-theoretic experience who use "density" in a much looser sense. This leads to some complications in our design that aren't really ideal, but are almost certainly better than "going it alone". As a result of this, there's some potential for MeasureBase to become a dependency for Distributions. That's very much a WIP, but it seems it could help with tying the ecosystem together more cleanly.
Ok I'm rambling now. Sorry, I'll get back to the points you made.
I agree
⊗makes sense for product measures, but prefer+for adding measures (as in (m1 + m2)(A) = m1(A) + m2(A)). Is this what is meant by "superposition" above?
Yes, that's right.
Probably I misunderstand, but I would have thought the coproduct of two measure spaces is their disjoint union, which would make the coproduct of two measures different from their sum in the sense just mentioned. For the coproduct measure on a disjoint union,
⊕makes sense to me.
Great point. I agree it would make sense for ⊗ and ⊕ to be used for the category-theoretic product and coproduct. I think you're right that superposition is not the coproduct (coproduct is always defined, but superposition requires two measures on the same space). But there's still an algebra. So any operator names we commit to need to be consistent with this algebra. Also, it's interesting that superposition can be considered as disjoint union (of two measures on a common space) followed by "forgetting" which component was chosen.
At the same time, we want people to use the package, and most new users will already be familiar with Distributions. And certainly any conflicts with Distributions should have good reasons behind them. It can be very difficult to strike the right balance.
Thanks for the response, @cscherrer. Yes, striking this balance is difficult. I would say you are in a much better position than I to do this. I just wanted to give you another point on the graph.