acle icon indicating copy to clipboard operation
acle copied to clipboard

[proposal] Add syntax for expressing priority of FMV target expressions

Open AlfieRichardsArm opened this issue 7 months ago • 12 comments

Summary

This proposal adds support for explicitly expressing the priority of function versions, in the target strings.

Example use case

For the case below, the default ordering has the sve feature as higher priority then dotprod, meaning for a target with both sve and dotprod the sve version would be selected. However, it may be that the dotprod version should be priority in this case.

typedef struct {
  int x;
  int y;
  int z;
  int w;
} Vec4;

[[target_clone("default", "sve")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
  // do a dot product
}

[[target_version("dotprod")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
  // Use dotprod intrinsics
}

Motivation

This enhancement was a natural result of considering the version priority rules. In specifying the default ordering, cases such as above, arose where the default priority rules would be a poor choice.

Possible solutions

There are several ways to solve this.

1. Add dummy features "priorityA", "priorityB", ...,

These features would have no effect on the versioned function, other than to change how they are ordered. These versions would be higher priority than any other feature, so would override any default ordering.

Then the above would be:

typedef struct {
  int x;
  int y;
  int z;
  int w;
} Vec4;

[[target_clone("default", "sve")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
  // do a dot product
}

[[target_version("dotprod+priorityA")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
  // Use dotprod intrinsics
}

Could also use "priority1", "priority2", ...

This is similar to what was done for other targets (https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85/files).

2. Label all the versions of a function

Another option is too support explicitly stating the order of all versions.

Something like:

typedef struct {
  int x;
  int y;
  int z;
  int w;
} Vec4;

[[target_clone("P3:default", "P2:sve")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
  // do a dot product
}

[[target_version("P1:dotprod")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
  // Use dotprod intrinsics
}

This seems like it introduces many more edge cases and complexity over suggestion 1 with little gain other than explicitness.

AlfieRichardsArm avatar Jun 23 '25 15:06 AlfieRichardsArm

As far as I understand, according to the first proposal, the labels priority<N> are considered "fake" FMV features. Should they be part of the mangled name or not?

labrinea avatar Jun 27 '25 09:06 labrinea

As far as I understand, according to the first proposal, the labels priority<N> are considered "fake" FMV features. Should they be part of the mangled name or not?

I think no. And we should disallow something like

[[target_version("dotprod+priorityA")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
  // Do something
}

[[target_version("dotprod")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
  // Do something else
}

ie, the priority features just change the priority ordering, not the signature/identity of the function.

I think it is viable to make it part of the identity, but in the case above it would just make

[[target_version("dotprod")]]
Vec4 dotproduct (Vec4 a, Vec4 b)
{
  // Do something else
}

a nop, which seems confusing

AlfieRichardsArm avatar Jun 27 '25 09:06 AlfieRichardsArm

I think you just gave an example that creates ambiguity, two different versions with the same mangled name (because priorityA was dropped from the first one). I think that's fine as long as we diagnose it.

labrinea avatar Jun 27 '25 10:06 labrinea

I think you just gave an example that creates ambiguity, two different versions with the same mangled name (because priorityA was dropped from the first one). I think that's fine as long as we diagnose it.

Yeah sorry that was the intent. I think we should diagnose this as a function re-declaration error, as it would be re-declaring the same symbol.

If we do the opposite and have them produce different mangled names then it's more confusing in my opinion, but not a deal breaker.

AlfieRichardsArm avatar Jun 27 '25 12:06 AlfieRichardsArm

Here is the LLVM implementation for option one (dummy features) of the above proposal: https://github.com/llvm/llvm-project/pull/146092. Despite some drawbacks it has I believe it is realistic to say option two would take substantially more effort to implement in LLVM. It is unfortunate, since I believe it is a little more intuitive for the user, but the reality is that due to a lot of code in the clang frontend which is shared among other targets (x86,riscv) and is not well maintained it will be nighmarish to implement option two. The real downsides of option one in my implementation is that we are allocating 5 entries from the priority bitmask (which has 64 bits in total currently) for fake features, also the fact we are limiting the priority override to only 5 versions, lastly we make room for user mistakes allowing them to use priority(i)+priority(j) together. But I think that last one is minor. Another challenge with option two in LLVM would be to reconstruct the priority bitmasks in the backend since the static resolution of calls relies on it. I thought of introducing new metadata fmv-priorityin LLVM-IR but haven't gone far exploring this. cc @jroelofs

labrinea avatar Jun 27 '25 15:06 labrinea

Oh nice, that was quick!

That's interesting to know about the IR metadata required. I'm surprised that the priority information needs to be passed through LLVM-IR at all.

AlfieRichardsArm avatar Jul 01 '25 09:07 AlfieRichardsArm

The real downsides of option one in my implementation is that we are allocating 5 entries from the priority bitmask (which has 64 bits in total currently) for fake features,

We'll need to extend past 64 bits anyway fairly soon, so I don't think the extra 5 bits in one of the bitmaks makes much difference.

also the fact we are limiting the priority override to only 5 versions, lastly we make room for user mistakes allowing them to use priority(i)+priority(j) together.

You could view that as a feature, not a bug - the 5 different priority specifiers allow 6 different levels of priority without combining priority specifiers, and 32 different priority levels if you do combine them.

alicecarlotti avatar Jul 04 '25 00:07 alicecarlotti

This is a proof of concept for option2 in LLVM: https://github.com/llvm/llvm-project/pull/147073. It is based on option1, so it is not a standalone solution. Let me describe the challenges. In Clang the attribute parsing happens in several phases (pre AST, post AST, when Sema checking, during name mangling, during codegen for FMV resolvers, etc). The parsing logic is not in one place, but rather replicated. Also it is shared with riscv and x86. Changing one breaks another. To deal with this in the least invasive way I am rewritting the attribute string for AArch64 before the AST is constructed such that it looks how option1 would expect it. I would hesitate approving this patch myself.

labrinea avatar Jul 04 '25 15:07 labrinea

Great, it seems we all prefer option 1. As that is what https://github.com/ARM-software/acle/pull/371 implements shall we merge that and close this out?

AlfieRichardsArm avatar Jul 07 '25 08:07 AlfieRichardsArm

We need wider consensus. I suggest we wait.

labrinea avatar Jul 07 '25 18:07 labrinea

I am proposing a third option: syntactic sugar for option1, basically a more user friendly annotation "priority=[1-32];featA+featB" where priority is expanded to a combination of priority1 + ... + priority5 under the hood to represent the 32 possible combinations. Surely comes with the development costs of option2. Prototype here: https://github.com/llvm/llvm-project/pull/147607. cc @tmatheson-arm @DanielKristofKiss

labrinea avatar Jul 08 '25 22:07 labrinea

This new third proposal is my preferred, I think thats better syntactically.

AlfieRichardsArm avatar Jul 14 '25 12:07 AlfieRichardsArm