[Feature Request] Any roadmap for supporting FP8 attention calculation?

Open MoFHeka opened this issue 2 years ago • 1 comments

There is only FP16/BF16 being supported in class FusedAttention.

Aug 10 '23 12:08 MoFHeka

I think this commit added fp8 fused attention: https://github.com/NVIDIA/TransformerEngine/commit/989a53a06478a4223ffb2fc2fc92b5febcf9d8c1#diff-236e240f7f5506de96cfd5f61c77c7142905dabada33f6f0c68094724dbfb9b4

Nov 14 '23 17:11 wenchenvincent

FP8 attention is supported for Delayed Scaling. See the fp8_dpa and fp8_mha arguments in the recipe docs.

Mar 13 '25 11:03 ksivaman