abi-aa [aaelf64] Clarify relocation optimization [issue #328]

Make it clearer that a GOT sequence can only be optimized if all GOT relocations to the symbol are part of a sequence.

Sep 15 '25 17:09 Wilco1

Can I check what you mean by All GOT relocations? To give an example

// Relocations in sequence, for the sake of the example; out of range
ADRP  x0, :got: symbol            // R_<CLS>_ADR_GOT_PAGE
LDR   x0, [x0 :got_lo12: symbol]  // R_<CLS>_LD64_GOT_LO12_NC
...
// Relocations in sequence, in range
ADRP  x1, :got: symbol            // R_<CLS>_ADR_GOT_PAGE
LDR   x1, [x1 :got_lo12: symbol]  // R_<CLS>_LD64_GOT_LO12_NC
...
// Instructions not in sequence, although relocations could be consecutive
ADRP  x2, :got: symbol            // R_<CLS>_ADR_GOT_PAGE
NOP
LDR   x2, [x1 :got_lo12: symbol]  // R_<CLS>_LD64_GOT_LO12_NC

I think you mean that in each of these three pairs of GOT relocations to symbol then all must be optimised or none. This would avoid the problem observed in https://github.com/llvm/llvm-project/issues/138254 which had the following example:

foo:
        cmp     x0, 0
        bge     .L8
        adrp    x2, :got:b // Optimised to adrp x2, b
.L9:
        ldr     x2, [x2, :got_lo12:b] // Optimised to adr x2, [x2 :lo12: b]
        add     x0, x2, x1
        b       bar
        .p2align 2,,3
.L8:
        stp     x29, x30, [sp, -32]!
        adrp    x2, :got:b // Can't be optimised due to add that is in way.
        add     x0, x0, 1 // although in theory could be out of range of b.
        ldr     x3, [x2, :got_lo12:b]
        add     x0, x3, x0
        mov     x29, sp
        stp     x2, x1, [sp, 16]
        bl      bar
        ldp     x2, x1, [sp, 16]
        ldp     x29, x30, [sp], 32
        b .L9 // Jump to after optimised ADRP expecting the ldr, but getting adr.

Sep 16 '25 09:09 smithp35

No - as described, all must be part of a sequence in order to be optimizable since that is proof they are independent. However not all need to be optimized (eg. if out of range). So in your example only the 3rd ADRP is not a valid sequence and thus blocks the optimization for the other 2.

An alternative would be to optimize all or nothing without considering pairs. But then a single ADRP that ends up out of range would block all optimizations, making it unsuitable for the medium/large model. Also keeping the ADRP/LDR as pairs results in better code quality.

Sep 16 '25 11:09 Wilco1

I think the problem observed https://github.com/llvm/llvm-project/issues/138254 could still happen with valid sequences, with at least one out of range. I agree it would be vanishingly unlikely and would be more difficult to check for.

I'd like to have a think about the wording and will make some suggestions. Hopefully tomorrow.

Sep 16 '25 13:09 smithp35

No it can't happen with only valid sequences since you cannot ever leak the value of the ADRP since the LDR in the next instruction overwrites it. Thus it is impossible to branch into the middle of a sequence.

Sep 16 '25 13:09 Wilco1

OK, I see that in the problem example the destination registers in the ADRP and LDR are different so this would be an invalid sequence even if the relocations were consecutive.

This does mean that it is insufficient to just look at the relocations. For example if I hand edit the LLVM example then all the relocations to b are in sequence, but we'd need to look at the instructions to detect the different destination registers.

I guess it means we'll need to be clear about what we mean by valid sequence.

foo:
        cmp     x0, 0
        bge     .L8
        adrp    x2, :got:b // Optimised to adrp x2, b
.L9:
        ldr     x2, [x2, :got_lo12:b] // Optimised to adr x2, [x2 :lo12: b]
        add     x0, x2, x1
        b       bar
        .p2align 2,,3
.L8:
        stp     x29, x30, [sp, -32]!
        adrp    x2, :got:b // Invalid sequence as destination registers don't match
        ldr     x3, [x2, :got_lo12:b] // but relocations are consecutive.
        add     x0, x0, 1 // reordered by hand 
        add     x0, x3, x0
        mov     x29, sp
        stp     x2, x1, [sp, 16]
        bl      bar
        ldp     x2, x1, [sp, 16]
        ldp     x29, x30, [sp], 32
        b .L9 // Jump to after optimised ADRP expecting the ldr, but getting adr.

Sep 16 '25 14:09 smithp35

The basic conditions explained above obviously still apply:

The relocations apply to consecutive instructions in the order specified.
The relocations use the same symbol.
The relocated instructions have the same source and destination register.

I thought the specification was clear enough... Perhaps we need to explain the optimization algorithm in pseudo code step-by-step?

Sep 16 '25 14:09 Wilco1

Closing as it's superceded by #362.

Jan 21 '26 19:01 Wilco1