[Bug] Fix flash_attn_func Return Value Handling for flash-attn3 Compatibility in wan_video_dit Model
🐛 问题描述 在wan_video_dit模型的注意力模块中,当前flash_attn_func调用方式与flash-attn>=3.0.0b版本存在接口不兼容问题: 原代码:
python
x = flash_attn_interface.flash_attn_func(q, k, v)
报错信息:
ValueError: too many values to unpack (expected 1)
🔍 根本原因 flash-attn3 beta版本接口变更: https://github.com/Dao-AILab/flash-attention/blob/main/hopper/flash_attn_interface.py#L518 显示函数现在返回Tuple[Tensor, ...],至少需要两个返回值接收位
🛠 建议修改 diff
- x = flash_attn_interface.flash_attn_func(q, k, v)
+ x, _ = flash_attn_interface.flash_attn_func(q, k, v) # 显式解包返回值
📌 附加备注 Beta版本标记:当前flash-attn3仍处于测试阶段,官方接口可能继续调整
I want to try flash attention 3 but its compile fails on Windows :(
https://github.com/Dao-AILab/flash-attention/issues/1524
@motoight Thank you for your feedback. This is a time-sensitive issue, and we will keep an eye on it.
On the other hand, we understand that the PyTorch team is gradually integrating Flash Attention-related technologies. Currently, the SDPA in PyTorch supports Flash Attention 2, which offers greater stability compared to the original version. Therefore, we do not want users to develop a habit of additional installation of Flash Attention; instead, we aim to use PyTorch's implementation of Flash Attention uniformly. For now, we will continue to retain this functionality until we have completed these aspects in PyTorch.
@Artiprocher ty. can you add please GGUF support? that really makes models fit into lower VRAM with sacrifice of some quality
GGUF becoming very widely used