InvokeAI icon indicating copy to clipboard operation
InvokeAI copied to clipboard

Allow explicitly selecting xFormers at install time

Open ebr opened this issue 1 year ago • 6 comments

Summary

A recent Torch upgrade introduced a performance regression on modern Nvidia GPUs when using xformers. This PR adds an install option to select xFormers for older GPUs, where it's needed, and avoid it for the newer cards that can use torch-sdp or other attention types.

This PR also points torch to the CUDA v12.4 extra-index-url for cpu-only Docker builds (not related to xformers issues).

QA Instructions

The updated installer, built from this PR, is attached here.. Please test it on Windows and macOS for completeness. On the mac it should simply proceed with the installation without displaying the GPU selection menu, and should result in a working install.

InvokeAI-installer-v5.1.0.zip

  1. Unzip the installer and run it using install.sh
  2. Install using option 2 (older nvidia GPU with xformers).
  3. Activate the virtual environment and validate that xformers is installed (pip freeze | grep xformers)
  4. Repeat this with option 1, and validate that xformers is NOT installed.

Merge Plan

Merge anytime

Checklist

  • [x] The PR has a short but descriptive title, suitable for a changelog

ebr avatar Oct 08 '24 14:10 ebr

I feel like whether we use xformers should be controlled via the config rather than whether xformers is installed. Is there a reason for doing it this way?

RyanJDick avatar Oct 08 '24 20:10 RyanJDick

I think we can handle choosing the correct attention type at runtime:

diff --git a/invokeai/backend/stable_diffusion/diffusers_pipeline.py b/invokeai/backend/stable_diffusion/diffusers_pipeline.py
index 646e1a92d..06a66f64b 100644
--- a/invokeai/backend/stable_diffusion/diffusers_pipeline.py
+++ b/invokeai/backend/stable_diffusion/diffusers_pipeline.py
@@ -195,7 +195,14 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
 
         # the remainder if this code is called when attention_type=='auto'
         if self.unet.device.type == "cuda":
-            if is_xformers_available():
+            # On 30xx and 40xx series GPUs, `torch-sdp` is faster than `xformers`. This corresponds to a CUDA major
+            # version of 8 or higher. So, for major version 7 or below, we prefer `xformers`.
+            # See:
+            # - https://developer.nvidia.com/cuda-gpus
+            # - https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
+            prefer_xformers = torch.cuda.get_device_properties("cuda").major <= 7
+
+            if is_xformers_available() and prefer_xformers:
                 self.enable_xformers_memory_efficient_attention()
                 return
             elif hasattr(torch.nn.functional, "scaled_dot_product_attention"):

psychedelicious avatar Oct 08 '24 22:10 psychedelicious

I feel like whether we use xformers should be controlled via the config rather than whether xformers is installed. Is there a reason for doing it this way?

This was a decision made in a Slack discussion, but I'm happy to do it whichever way preferable. I think not having xformers installed at all is the safest choice, because we can't control which 3rd-party code might use it if detected.

I like the runtime decision approach suggested by @psychedelicious. Can easily add that to the PR, but also think we shouldn't be installing it unless necessary.

ebr avatar Oct 09 '24 14:10 ebr

I added the suggested runtime change in the latest commit

ebr avatar Oct 09 '24 14:10 ebr

I guess if for some reason a person with a new card wanted to use xformers, they'd have to download it manually 🤔 As much as I am a fan of not installing something that won't be used the vast majority of the time

brandonrising avatar Oct 09 '24 15:10 brandonrising

I guess if for some reason a person with a new card wanted to use xformers, they'd have to download it manually

There's no good reason for someone with a modern GPU to use xformers though, is there?

As much as I'd like to accommodate that individual who swaps their newer (30xx+) GPU for an older (20xx-) GPU and wants to continue seamlessly using Invoke with the same performance profile.... not convinced that's an edge case we really need to account for ;)

ebr avatar Oct 09 '24 16:10 ebr