[aaelf64] Clarify relocation optimization [issue #328]
Make it clearer that a GOT sequence can only be optimized if all GOT relocations to the symbol are part of a sequence.
Can I check what you mean by All GOT relocations? To give an example
// Relocations in sequence, for the sake of the example; out of range
ADRP x0, :got: symbol // R_<CLS>_ADR_GOT_PAGE
LDR x0, [x0 :got_lo12: symbol] // R_<CLS>_LD64_GOT_LO12_NC
...
// Relocations in sequence, in range
ADRP x1, :got: symbol // R_<CLS>_ADR_GOT_PAGE
LDR x1, [x1 :got_lo12: symbol] // R_<CLS>_LD64_GOT_LO12_NC
...
// Instructions not in sequence, although relocations could be consecutive
ADRP x2, :got: symbol // R_<CLS>_ADR_GOT_PAGE
NOP
LDR x2, [x1 :got_lo12: symbol] // R_<CLS>_LD64_GOT_LO12_NC
I think you mean that in each of these three pairs of GOT relocations to symbol then all must be optimised or none. This would avoid the problem observed in https://github.com/llvm/llvm-project/issues/138254 which had the following example:
foo:
cmp x0, 0
bge .L8
adrp x2, :got:b // Optimised to adrp x2, b
.L9:
ldr x2, [x2, :got_lo12:b] // Optimised to adr x2, [x2 :lo12: b]
add x0, x2, x1
b bar
.p2align 2,,3
.L8:
stp x29, x30, [sp, -32]!
adrp x2, :got:b // Can't be optimised due to add that is in way.
add x0, x0, 1 // although in theory could be out of range of b.
ldr x3, [x2, :got_lo12:b]
add x0, x3, x0
mov x29, sp
stp x2, x1, [sp, 16]
bl bar
ldp x2, x1, [sp, 16]
ldp x29, x30, [sp], 32
b .L9 // Jump to after optimised ADRP expecting the ldr, but getting adr.
No - as described, all must be part of a sequence in order to be optimizable since that is proof they are independent. However not all need to be optimized (eg. if out of range). So in your example only the 3rd ADRP is not a valid sequence and thus blocks the optimization for the other 2.
An alternative would be to optimize all or nothing without considering pairs. But then a single ADRP that ends up out of range would block all optimizations, making it unsuitable for the medium/large model. Also keeping the ADRP/LDR as pairs results in better code quality.
I think the problem observed https://github.com/llvm/llvm-project/issues/138254 could still happen with valid sequences, with at least one out of range. I agree it would be vanishingly unlikely and would be more difficult to check for.
I'd like to have a think about the wording and will make some suggestions. Hopefully tomorrow.
No it can't happen with only valid sequences since you cannot ever leak the value of the ADRP since the LDR in the next instruction overwrites it. Thus it is impossible to branch into the middle of a sequence.
OK, I see that in the problem example the destination registers in the ADRP and LDR are different so this would be an invalid sequence even if the relocations were consecutive.
This does mean that it is insufficient to just look at the relocations. For example if I hand edit the LLVM example then all the relocations to b are in sequence, but we'd need to look at the instructions to detect the different destination registers.
I guess it means we'll need to be clear about what we mean by valid sequence.
foo:
cmp x0, 0
bge .L8
adrp x2, :got:b // Optimised to adrp x2, b
.L9:
ldr x2, [x2, :got_lo12:b] // Optimised to adr x2, [x2 :lo12: b]
add x0, x2, x1
b bar
.p2align 2,,3
.L8:
stp x29, x30, [sp, -32]!
adrp x2, :got:b // Invalid sequence as destination registers don't match
ldr x3, [x2, :got_lo12:b] // but relocations are consecutive.
add x0, x0, 1 // reordered by hand
add x0, x3, x0
mov x29, sp
stp x2, x1, [sp, 16]
bl bar
ldp x2, x1, [sp, 16]
ldp x29, x30, [sp], 32
b .L9 // Jump to after optimised ADRP expecting the ldr, but getting adr.
The basic conditions explained above obviously still apply:
- The relocations apply to consecutive instructions in the order specified.
- The relocations use the same symbol.
- The relocated instructions have the same source and destination register.
I thought the specification was clear enough... Perhaps we need to explain the optimization algorithm in pseudo code step-by-step?
Closing as it's superceded by #362.