MPS: models require an initial pass for reproducibility
I have no idea what's causing this, I suspect something related to RNGs, as we know they don't work properly in mps yet.
This is a pending task from our initial mps support (#355) that we need to investigate. For now, we have adopted two workarounds:
- Perform an extra pass as appropriate so tests pass.
- Recommend users that care about reproducibility / accuracy to do the same.
In the second part of this comment I documented a similar behaviour using einsum vs matmul. I used @patil-suraj's branch to run the tests without the warmup pass, but they still failed. So there must be other ops that have a similar behaviour.
I found that this mitigation works, but I assume it only works if you're consistent in the size of x and context that you submit to CrossAttention (e.g. same image dimensions, number of samples, number of conditions):
https://github.com/Birch-san/stable-diffusion/commit/9b1be383f18e6cbec7e50363400c8a359c6e150e
it's not a cheap mitigation. unless we get lucky and there's a fast-path for zeroes or something.
cc @pcuenca :-)
Is this still relevant (gently pinging @pcuenca )
Is this still relevant
Yes, unfortunately, still an issue with the RC of PyTorch 1.13 from the test wheel.
@Birch-san In my experience this is only required the first time you use the model. Is this negatively impacting your use case in a non-transient mode?
personally I run pytorch stable 1.12.1 (because it's faster than 1.13RC or the nightlies https://github.com/pytorch/pytorch/issues/85297, ~https://github.com/pytorch/pytorch/issues/87010~), so I don't encounter the einsum reproducibility problem.
my use-case is almost always "launch txt2img just to generate 1 image" (i.e. transient), so if I were on an exposed version of pytorch: this issue would hit me every time.
it's more than just non-determinism; the first einsum result is nonsense. so quality will be strictly lower (I expect it means the highest sigma gets denoised to some mess, but subsequent denoising brings the image back on track, albeit undersampled and with the composition a bit randomized).
I guess the real questions are:
- which versions of pytorch does HF support?
- will this bug be promoted to pytorch 1.13 stable?
- from @pcuenca's testing: sounds like it is
Thanks @Birch-san! Great work posting all those details in PyTorch's github, it looks like they are really helping!
I'll do some more testing about the non-determinism to try to isolate what ops are still affected.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Why GH closed it? It should remain open until solved.
Gently pinging @pcuenca here (not sure if we have time to closer look into this issue though). Would be extremely nice if the community could also look into it / investigate here a bit
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Yeah bot. Address it.
I'll ask in the PyTorch forums again.
wasn't this problem due to this einsum bug:
https://github.com/pytorch/pytorch/issues/85224
solved since at least 1.13.0.dev20220928 (so should be in latest stable, 1.13.1).
in any case: diffusers CrossAttention doesn't use einsum any more. it uses baddbmm and bmm.
We might instead close it as now Apple has released SD with CoreML.
hard disagree that a CoreML model is a substitute to having a working PyTorch MPS model.
but I do think diffusers is deterministic on MPS anyway.
Right. ~~Although we can train SD with MPS only (with more powerful future version of M-series chips) and CoreML is just for inference.~~
I take back my words. Used new Nvidia GPUs and M chips is extremely far behind them. Not even messing with RTX 3090 let alone A100 and others!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This doesn't fully work for me yet.
The way I'm testing is by commenting these lines: https://github.com/huggingface/diffusers/blob/97958bcdc808649d13b0fa3b0f7fad686caf3866/tests/test_modeling_common.py#L53-L55
and then running the tests for test_from_save_pretrained on UNet2DModelTests. If the lines are commented the tests fail, and they succeed otherwise.
(Tested on PyTorch 1.13.1 stable)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Update: the bot was right in this case. This was never resolved in PyTorch 1.13, but works in PyTorch 2.
Smart bot!