diffusers
diffusers copied to clipboard
Disable Sage Attention sm90 backend due to confetti/noisy output
What does this PR do?
Fixes #12783
This PR temporarily disables the Sage Attention sm90 backend which is causing confetti/noisy output on SM 9.0+ (Hopper) GPUs.
The Problem
The _SAGE_QK_INT8_PV_FP8_CUDA_SM90 backend was automatically being selected on SM 9.0+ GPUs (Hopper architecture) due to the constraint:
constraints=[_check_device_cuda_atleast_smXY(9, 0), _check_shape]
However, this backend is producing incorrect output (described as "confetti" or "noisy" output), indicating a bug in the underlying sageattention library's sm90 implementation.
The Solution
Temporarily disabled the sm90 backend by commenting out its registration:
- Users on SM 9.0+ GPUs will now fall back to the standard Sage Attention backends
- Added a comment referencing issue #12783 for future reference
- This is a temporary workaround until the upstream sageattention library fixes the sm90 implementation
Impact
- ✅ Fixes the confetti/noisy output issue on Hopper GPUs
- ✅ Users can still use other Sage Attention backends
- ✅ No breaking changes for users not on SM 9.0+ devices
- ⚠️ SM 9.0+ users won't get sm90-specific optimizations until upstream fixes the bug
Future Work
This backend can be re-enabled once the sageattention library fixes the sm90 implementation bug.