diffusers MPS: models require an initial pass for reproducibility

I have no idea what's causing this, I suspect something related to RNGs, as we know they don't work properly in mps yet.

This is a pending task from our initial mps support (#355) that we need to investigate. For now, we have adopted two workarounds:

Perform an extra pass as appropriate so tests pass.
Recommend users that care about reproducibility / accuracy to do the same.

Sep 06 '22 12:09 pcuenca

In the second part of this comment I documented a similar behaviour using einsum vs matmul. I used @patil-suraj's branch to run the tests without the warmup pass, but they still failed. So there must be other ops that have a similar behaviour.

Sep 09 '22 17:09 pcuenca

I found that this mitigation works, but I assume it only works if you're consistent in the size of x and context that you submit to CrossAttention (e.g. same image dimensions, number of samples, number of conditions):
https://github.com/Birch-san/stable-diffusion/commit/9b1be383f18e6cbec7e50363400c8a359c6e150e

it's not a cheap mitigation. unless we get lucky and there's a fast-path for zeroes or something.

Sep 17 '22 21:09 Birch-san

cc @pcuenca :-)

Sep 22 '22 13:09 patrickvonplaten

Is this still relevant (gently pinging @pcuenca )

Oct 14 '22 18:10 patrickvonplaten

Is this still relevant

Yes, unfortunately, still an issue with the RC of PyTorch 1.13 from the test wheel.

@Birch-san In my experience this is only required the first time you use the model. Is this negatively impacting your use case in a non-transient mode?

Oct 17 '22 19:10 pcuenca

personally I run pytorch stable 1.12.1 (because it's faster than 1.13RC or the nightlies https://github.com/pytorch/pytorch/issues/85297, ~https://github.com/pytorch/pytorch/issues/87010~), so I don't encounter the einsum reproducibility problem.

my use-case is almost always "launch txt2img just to generate 1 image" (i.e. transient), so if I were on an exposed version of pytorch: this issue would hit me every time.

it's more than just non-determinism; the first einsum result is nonsense. so quality will be strictly lower (I expect it means the highest sigma gets denoised to some mess, but subsequent denoising brings the image back on track, albeit undersampled and with the composition a bit randomized).

I guess the real questions are:

which versions of pytorch does HF support?
will this bug be promoted to pytorch 1.13 stable?
- from @pcuenca's testing: sounds like it is

Oct 17 '22 21:10 Birch-san

Thanks @Birch-san! Great work posting all those details in PyTorch's github, it looks like they are really helping!

I'll do some more testing about the non-determinism to try to isolate what ops are still affected.

Oct 18 '22 18:10 pcuenca

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Nov 12 '22 15:11 github-actions[bot]

Why GH closed it? It should remain open until solved.

Dec 11 '22 17:12 RahulBhalley

Gently pinging @pcuenca here (not sure if we have time to closer look into this issue though). Would be extremely nice if the community could also look into it / investigate here a bit

Dec 13 '22 17:12 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 07 '23 15:01 github-actions[bot]

Yeah bot. Address it.

Jan 08 '23 03:01 RahulBhalley

I'll ask in the PyTorch forums again.

Jan 08 '23 10:01 pcuenca

wasn't this problem due to this einsum bug:
https://github.com/pytorch/pytorch/issues/85224

solved since at least 1.13.0.dev20220928 (so should be in latest stable, 1.13.1).

in any case: diffusers CrossAttention doesn't use einsum any more. it uses baddbmm and bmm.

Jan 08 '23 14:01 Birch-san

We might instead close it as now Apple has released SD with CoreML.

Jan 10 '23 17:01 RahulBhalley

hard disagree that a CoreML model is a substitute to having a working PyTorch MPS model.
but I do think diffusers is deterministic on MPS anyway.

Jan 10 '23 17:01 Birch-san

Right. ~~Although we can train SD with MPS only (with more powerful future version of M-series chips) and CoreML is just for inference.~~

I take back my words. Used new Nvidia GPUs and M chips is extremely far behind them. Not even messing with RTX 3090 let alone A100 and others!

Jan 11 '23 01:01 RahulBhalley

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Feb 04 '23 15:02 github-actions[bot]

This doesn't fully work for me yet.

The way I'm testing is by commenting these lines: https://github.com/huggingface/diffusers/blob/97958bcdc808649d13b0fa3b0f7fad686caf3866/tests/test_modeling_common.py#L53-L55

and then running the tests for test_from_save_pretrained on UNet2DModelTests. If the lines are commented the tests fail, and they succeed otherwise.

(Tested on PyTorch 1.13.1 stable)

Feb 06 '23 10:02 pcuenca

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Mar 02 '23 15:03 github-actions[bot]

Update: the bot was right in this case. This was never resolved in PyTorch 1.13, but works in PyTorch 2.

Mar 21 '23 18:03 pcuenca

Smart bot!

Mar 23 '23 12:03 patrickvonplaten