FastVideo icon indicating copy to clipboard operation
FastVideo copied to clipboard

[Bug] Why is VSA slower than Flash_attention when I run the training script examples/training/finetune/Wan2.1-VSA/Wan-Syn-Data/T2V-14B-VSA.slurm?

Open pengyige123 opened this issue 3 months ago • 2 comments

Describe the bug

The phenomenon is as follows: vsa Image When using VSA, each step takes 11.91 seconds.

Fa

Image When using Flash attention, each step takes 9.39 seconds.

thank you very much, Looking forward to your reply

Reproduction

examples/training/finetune/Wan2.1-VSA/Wan-Syn-Data/T2V-14B-VSA.slurm , Only the environment variable FASTVIDEO_ATTENTION_BACKEND was modified.

Environment

GPU: L40s cuda: 12.8 Driver Version: 535.230.02

pengyige123 avatar Oct 30 '25 08:10 pengyige123

If you’re using sparsity decay, then at the beginning of training sparsity is zero, so the model computes full attention, which is typically slower than a FlashAttention implementation.

nappengman avatar Nov 02 '25 11:11 nappengman

If you’re using sparsity decay, then at the beginning of training sparsity is zero, so the model computes full attention, which is typically slower than a FlashAttention implementation.

Thank you very much for your answer.

pengyige123 avatar Nov 19 '25 03:11 pengyige123