Pankaj Mathur
Results
2
comments of
Pankaj Mathur
Same boat here, I try testing by trying both versions of flash attention individually using monkey patching code of FastChat https://github.com/lm-sys/FastChat/blob/main/fastchat/train/llama_flash_attn_monkey_patch.py but got stuck on similar error which @ehartford reported...
Yup I tried that already it’s throws same error for Llama2 70b. It’s same monkey patch code from FastChat which I tried to integrate.