rust
rust copied to clipboard
Suboptimal codegen for snippet with Armv7 target
The code generated for this particular function seems quite suboptimal,
pub const fn f(n: u8) -> [u8; 4] {
match n % 4 {
0 => [0x0, 0x1, 0x2, 0x3],
1 => [0x4, 0x5, 0x6, 0x7],
2 => [0x8, 0x9, 0xA, 0xB],
3 => [0xC, 0xD, 0xE, 0xF],
_ => unsafe { std::hint::unreachable_unchecked() }
}
}
From my observations, for all targets, when written as-is above, it emits a switch table and accesses memory.
For x86-64, if the inner arrays are moved into constants, the switch table is removed, and the code is replaced with arithmetic.
Side-by-side comparisons between x86-64 codegen versus armv7-linux-androideabi: https://godbolt.org/z/ehxabaq38
Here, I was able to manually rewrite the expression into the equivalent of what LLVM emits above: https://godbolt.org/z/qhfaqEcsf
Nothing else seemed to make the compiler emit the specific codegen.
Unknown as to whether this applies to other output targets.
@rustbot label A-LLVM I-slow