Missing Inverse Wishart - redefine Wishart?
This seems like an odd distribution to be missing, especially given that it is the standard conjugate prior for the covariance matrix of multivariate normals.
Technically, it can be instantiated by making a TransformedDistribution with Wishart as the base, and Invert(CholeskyOuterProduct), CholeskyToInvCholesky and CholeskyOuterProduct but this incurs the cost of a cholesky factorization and cannot make use of the Wishart's input_output_cholesky flag (which is a bit inconsistent with the rest of TFP?).
I think the most OOP design choice would be to remove the existing Wisharts and define a new WishartCholesky distribution, as this density is fairly easy and analytic (see Bartlett decomposition). I believe that is what TFP is already using to sample Wisharts, so it's not a big jump. The full Wishart and the inverse Wishart can then be easily parameterized using bijectors.
The Wishart cholesky consists of a triangular matrix with below-diagonal elements following one distribution, and the diagonal elements following another. Is there any way to specify that using TransformedDistribution and bijectors? It seems there is a need for transformed densities on multiple variables, concatenation bijectors, and/or better triangle-filling bijectors?
Definitely agreed the inverse Wishart would be good to add, and this is a good path to making it happen. Are you interested in taking it on @Bonnevie? We would certainly appreciate the contribution :)
Slightly stupid question: is the proposed WishartCholesky just identically the same as Wishart(input_output_cholesky=True) with a log prob correction? I imagine the correction would be tfb.CholeskyOuterProduct().inverse_log_det_jacobian(x, event_ndims=2) (though I'm just pattern-matching here, so somebody should check me). If this is true, then it should be pretty easy to arrange all the pieces nicely, and provide Wishart and InverseWishart, and maybe also expose WishartCholesky (which perhaps should be called CholeskyWishart for consistency).
I might try to make a PR. You are right that it could probably be implemented by hacking the existing Wishart density with a log-det-jacobian correction. With input_output_cholesky=True the density is on matrix variables, while the samples are cholesky factors, so you are right that the appropriate correction would be that for the inverse of a tfb.CholeskyOuterProduct().
Thanks!
I agree it would be nice to have this distribution. That said, the approach used here: https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Bayesian_Gaussian_Mixture_Model.ipynb is more efficient since matmul is faster than cholesky. (Basically, the trick is to redefine the MVN not the Wishart.) This is a very clever trick because even if matmul and cholesky were asymptotically equivalent, the constants are always worse for cholesky as is its (in)ability to be parallelized (easily). Some processors even have special hardware for matmul; Ive never seen special hardware for cholesky. (Sadly.)
See also: tfb.CholeskyToInvCholesky