Why is sage_attn2 disabled on H100 with CPU offload?
Found this restriction in the code: https://github.com/ModelTC/LightX2V/blob/4f3534923b64cc4fb3b7bbb8343fa224737ee6b5/lightx2v/models/networks/wan/infer/transformer_infer.py#L39
Why is sage_attn2 incompatible with H100 (compute capability 9.0) + CPU offload? Thank you for your time! Could you please explain the reasoning behind this restriction?
We have identified that in the offload scenario, running the 2.1.0/2.2.0 version of SageAttention within a torch.cuda.stream context (as referenced in https://github.com/ModelTC/LightX2V/blob/4df3bee459d500830cdb51105eba20131bac7423/lightx2v/models/networks/wan/infer/transformer_infer.py#L108) leads to accuracy issues.
@gushiqiao Thank you for your response. Do you have any plans to address this issue? Would this problem still occur on an RTX 4090?
When using lightx2v to enable offload on RTX 4090, sageattn will not have accuracy issues.
@gushiqiao Hello, I'm still encountering this issue on RTX 4090 where videos are black frames when using sage_attn2.
You can install our modified SageAttention library to address this issue. https://github.com/ModelTC/SageAttention