wangjiayue

Results 2 issues of wangjiayue

Signed-off-by: 王佳越 10335419

Hello,I applied FA3 in the fine-tuning of the qwen2 model, using an H800 machine. The test was slower than FA2 under the same conditions. I used FlashAttnFunc.forward in hopper/flash_attn_interface.py file...