TransformerEngine
TransformerEngine copied to clipboard
Restructure attention C API
Is your feature request related to a problem? Please describe.
At the moment, we enumerate the parameters in C APIs like this:
https://github.com/NVIDIA/TransformerEngine/blob/5e4e0b2c378d2b1ec2ee65dfa85124e1dd805389/transformer_engine/common/fused_attn/fused_attn.cpp#L835
As we add more features to attention, the list of non-tensor variables are also growing, which is a breaking change every time.
Describe the solution you'd like To avoid the breaking change, we should seek other approaches such as packing most/all of the non-tensor variables into a struct and only modifying the content of the struct as we add features.
Describe alternatives you've considered
N/A
Additional context
N/A