Allow explicitly selecting xFormers at install time
Summary
A recent Torch upgrade introduced a performance regression on modern Nvidia GPUs when using xformers.
This PR adds an install option to select xFormers for older GPUs, where it's needed, and avoid it for the newer cards that can use torch-sdp or other attention types.
This PR also points torch to the CUDA v12.4 extra-index-url for cpu-only Docker builds (not related to xformers issues).
QA Instructions
The updated installer, built from this PR, is attached here.. Please test it on Windows and macOS for completeness. On the mac it should simply proceed with the installation without displaying the GPU selection menu, and should result in a working install.
- Unzip the installer and run it using
install.sh - Install using option 2 (older nvidia GPU with xformers).
- Activate the virtual environment and validate that
xformersis installed (pip freeze | grep xformers) - Repeat this with option 1, and validate that
xformersis NOT installed.
Merge Plan
Merge anytime
Checklist
- [x] The PR has a short but descriptive title, suitable for a changelog
I feel like whether we use xformers should be controlled via the config rather than whether xformers is installed. Is there a reason for doing it this way?
I think we can handle choosing the correct attention type at runtime:
diff --git a/invokeai/backend/stable_diffusion/diffusers_pipeline.py b/invokeai/backend/stable_diffusion/diffusers_pipeline.py
index 646e1a92d..06a66f64b 100644
--- a/invokeai/backend/stable_diffusion/diffusers_pipeline.py
+++ b/invokeai/backend/stable_diffusion/diffusers_pipeline.py
@@ -195,7 +195,14 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
# the remainder if this code is called when attention_type=='auto'
if self.unet.device.type == "cuda":
- if is_xformers_available():
+ # On 30xx and 40xx series GPUs, `torch-sdp` is faster than `xformers`. This corresponds to a CUDA major
+ # version of 8 or higher. So, for major version 7 or below, we prefer `xformers`.
+ # See:
+ # - https://developer.nvidia.com/cuda-gpus
+ # - https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
+ prefer_xformers = torch.cuda.get_device_properties("cuda").major <= 7
+
+ if is_xformers_available() and prefer_xformers:
self.enable_xformers_memory_efficient_attention()
return
elif hasattr(torch.nn.functional, "scaled_dot_product_attention"):
I feel like whether we use xformers should be controlled via the config rather than whether xformers is installed. Is there a reason for doing it this way?
This was a decision made in a Slack discussion, but I'm happy to do it whichever way preferable. I think not having xformers installed at all is the safest choice, because we can't control which 3rd-party code might use it if detected.
I like the runtime decision approach suggested by @psychedelicious. Can easily add that to the PR, but also think we shouldn't be installing it unless necessary.
I added the suggested runtime change in the latest commit
I guess if for some reason a person with a new card wanted to use xformers, they'd have to download it manually 🤔 As much as I am a fan of not installing something that won't be used the vast majority of the time
I guess if for some reason a person with a new card wanted to use xformers, they'd have to download it manually
There's no good reason for someone with a modern GPU to use xformers though, is there?
As much as I'd like to accommodate that individual who swaps their newer (30xx+) GPU for an older (20xx-) GPU and wants to continue seamlessly using Invoke with the same performance profile.... not convinced that's an edge case we really need to account for ;)