Restructure attention C API

Open nvMelissa opened this issue 3 months ago • 0 comments

Is your feature request related to a problem? Please describe.

At the moment, we enumerate the parameters in C APIs like this:

https://github.com/NVIDIA/TransformerEngine/blob/5e4e0b2c378d2b1ec2ee65dfa85124e1dd805389/transformer_engine/common/fused_attn/fused_attn.cpp#L835

As we add more features to attention, the list of non-tensor variables are also growing, which is a breaking change every time.

Describe the solution you'd like To avoid the breaking change, we should seek other approaches such as packing most/all of the non-tensor variables into a struct and only modifying the content of the struct as we add features.

Describe alternatives you've considered

N/A

Additional context

N/A

Nov 02 '25 21:11 nvMelissa