About equation (9) in the paper.
I find that the score is taken as the opposite number in the code. Why is the final score calculated in this way?
The equation (9) in the paper is:
The following is the code implementation:
composition = -torch.logsumexp(torch.stack((-e1, -e2), dim=0), dim=0)
Hello,
This might be a legacy typo. The composition score simulating the OR operator uses the function $-logsumexp(-e_1-e_2)$ in [1], while we adopt a different definition of energy functions $p(x) = e^{E(x)}$ to align with the definition of the OOD score.
In our experiments, I recall that the results using torch.logsumexp(torch.stack((e1, e2), dim=0), dim=0) and -torch.logsumexp(torch.stack((-e1, -e2), dim=0), dim=0) were similar (though I am not completely certain as it has been some time since then). If you are interested, feel free to give it a try.
[1] Du et al. Compositional Visual Generation with Energy Based Models. NeurIPS 2020.