pyro RenyiELBO/IWAE fail to converge on AIR example

Issue Description

Hello Pyro folks. I was trying to use IWAE in the AIR example to see if the stricter lower bound yields better performance than the standard ELBO. However, after swapping out the elbo method with RenyiELBO(alpha=0), the model fails to converge completely. As a diagnostic, I also tried setting num_particles=1 to see if it at least falls back to the standard ELBO behavior, but the accuracy of the AIR model still does not improve at all. After reading https://github.com/pyro-ppl/pyro/issues/2220, I also tried reducing batch_size=1, yet there's no change in the performance of the model either.

I'm wondering if any of you might have some insights on what could cause the performance discrepancy between RenyiELBO vs TraceGraph_ELBO? Thank you very much :)

epoch vs accuracy epoch vs -ELBO

Environment

For any bugs, please provide the following:

Platform: MacOS 14.0, Python 3.11.5
Pyro 1.8.6
PyTorch 2.1.0

Code Snippet

The issue could be reproduced by running the AIR Example in Pyro's codebase and replacing this elbo setting with RenyiELBO().

Oct 31 '23 17:10 horizon-blue

we would need to get something like https://github.com/pyro-ppl/pyro/pull/3123 merged.

what happens when you use TraceGraphELBO with multiple particles?

Oct 31 '23 18:10 martinjankowiak

Thanks for the reply @martinjankowiak .

I pull the changes from #3123, but sadly, it doesn't seem to fix the issue (and the ELBO is even worse). I also include the result of TraceGraph_ELBO with two particles: Epoch vs accuracy Epoch vs -ELBO

Nov 01 '23 21:11 horizon-blue

oh sorry i wasn't thinking clearly when i first read this. AIR has discrete latent variables. to deal with that you can either sum them out (not really viable here) or use a stochastic gradient estimator. TraceGraphELBO uses a fancier and thus much lower variance gradient estimator that makes use of the fine-grained conditional independent structure of the model. RenyiELBO cannot do this and so results in a much higher variance gradient estimator---actually so much higher that it's evidently not usable. so this is expected

Nov 01 '23 22:11 martinjankowiak