pengyige123

Results 5 comments of pengyige123

> Could you share a bit more about what you are trying to train? Are you trying to train VSA + Wan2.2 moe model? Yes, I used WAN2.2 + VSA...

> Hopper GPU ok, I'll try out the Thunderkitten at H800 first,However, there's a strange phenomenon here: the high-noise model trains very well. The high-noise and low-noise models are the...

> do you have a branch with your scripts? I think there may be a bug in our VSA triton kernel bwd, but I'm not sure if this is the...

> Just for reference, the bug mentioned by [@SolitaryThinker](https://github.com/SolitaryThinker) was just fixed in the PR ([#879](https://github.com/hao-ai-lab/FastVideo/pull/879)). The new version of VSA Triton kernel might also be worth trying. I'll try...

> If you’re using sparsity decay, then at the beginning of training sparsity is zero, so the model computes full attention, which is typically slower than a FlashAttention implementation. Thank...