Disable Sage Attention sm90 backend due to confetti/noisy output

Open arrdel opened this issue 2 months ago • 0 comments

What does this PR do?

Fixes #12783

This PR temporarily disables the Sage Attention sm90 backend which is causing confetti/noisy output on SM 9.0+ (Hopper) GPUs.

The Problem

The _SAGE_QK_INT8_PV_FP8_CUDA_SM90 backend was automatically being selected on SM 9.0+ GPUs (Hopper architecture) due to the constraint:

constraints=[_check_device_cuda_atleast_smXY(9, 0), _check_shape]

However, this backend is producing incorrect output (described as "confetti" or "noisy" output), indicating a bug in the underlying sageattention library's sm90 implementation.

The Solution

Temporarily disabled the sm90 backend by commenting out its registration:

Users on SM 9.0+ GPUs will now fall back to the standard Sage Attention backends
Added a comment referencing issue #12783 for future reference
This is a temporary workaround until the upstream sageattention library fixes the sm90 implementation

Impact

✅ Fixes the confetti/noisy output issue on Hopper GPUs
✅ Users can still use other Sage Attention backends
✅ No breaking changes for users not on SM 9.0+ devices
⚠️ SM 9.0+ users won't get sm90-specific optimizations until upstream fixes the bug

Future Work

This backend can be re-enabled once the sageattention library fixes the sm90 implementation bug.

Dec 03 '25 17:12 arrdel