LightX2V Why is sage_attn2 disabled on H100 with CPU offload?

Found this restriction in the code: https://github.com/ModelTC/LightX2V/blob/4f3534923b64cc4fb3b7bbb8343fa224737ee6b5/lightx2v/models/networks/wan/infer/transformer_infer.py#L39

Why is sage_attn2 incompatible with H100 (compute capability 9.0) + CPU offload? Thank you for your time! Could you please explain the reasoning behind this restriction?

Aug 08 '25 02:08 fffanty

We have identified that in the offload scenario, running the 2.1.0/2.2.0 version of SageAttention within a torch.cuda.stream context (as referenced in https://github.com/ModelTC/LightX2V/blob/4df3bee459d500830cdb51105eba20131bac7423/lightx2v/models/networks/wan/infer/transformer_infer.py#L108) leads to accuracy issues.

Aug 08 '25 03:08 gushiqiao

@gushiqiao Thank you for your response. Do you have any plans to address this issue? Would this problem still occur on an RTX 4090？

Aug 08 '25 07:08 fffanty

When using lightx2v to enable offload on RTX 4090, sageattn will not have accuracy issues.

Aug 09 '25 02:08 gushiqiao

@gushiqiao Hello, I'm still encountering this issue on RTX 4090 where videos are black frames when using sage_attn2.

Aug 12 '25 15:08 fffanty

You can install our modified SageAttention library to address this issue. https://github.com/ModelTC/SageAttention

Aug 15 '25 11:08 gushiqiao