Sinkhorn Divergence code does not match referenced paper
Describe the bug
I'm opening as a bug, but the code actually does work. The issue is the definition used for Sinkhorn Divergence.
Basically, the Sinkhorn Divergence code does not match the formula in the referenced paper (["Learning Generative Models with Sinkhorn Divergences", 2017])(https://arxiv.org/pdf/1706.00292.pdf).
Code sample and Expected behavior
Here is a piece of the code from empirical_sinkhorn_divergence:
sinkhorn_div = sinkhorn_loss_ab - 0.5 * (sinkhorn_loss_a + sinkhorn_loss_b).
To match the sinkhorn divergence formula from the paper, the code should probably be:
sinkhorn_div = 2* sinkhorn_loss_ab - (sinkhorn_loss_a + sinkhorn_loss_b).
This is a minor issue, but perhaps the documentation should address this difference.
Another issue is that the sinkhorn_loss returns
W &= \min_\gamma <\gamma,M>_F + reg\cdot\Omega(\gamma)
While in the paper, the sinkhorn cost is only
W &= \gamma* <\gamma,M>_F,
where \gamma* is the optimal plan for the regularized problem. In other words, the regularization term is only used to find the optimal plan and is then discarded.
This is a good point. Note that the Sinkhorn divergence in the paper below with exactly the same notation but different definition (and is studied theoretically) has the 1/2 scaling and contains the entropic regularization term.
http://proceedings.mlr.press/v89/genevay19a/genevay19a.pdf
Also note that we actually return only the linear part in the sinkhorn function so we do not implement thet paper above either ;).
I agree that we definitely need a better documentation here. Feel free to propose a PR.
Note that we are currently working ona new API that will aloow to recover le linear on complete loss and implement properly the different variants of sinkhorn divergence.
Thanks for the answer!