CompCert icon indicating copy to clipboard operation
CompCert copied to clipboard

aarch64 assembler error: conditional branch out of range

Open bagnalla opened this issue 3 months ago • 4 comments

I encountered an issue when compiling libccv with CompCert 3.16 on a Raspberry Pi 5 (Arm Cortex A76) running Raspberry Pi OS.

Some of the functions in libccv are apparently large enough to cause ccomp to generate tbnz instructions whose target label is outside the allowed range (+= 32KB offset from the branch instruction, according to Arm documentation). For example, the code generated for function ccv_sobel in file ccv_basic.c is rejected by the assembler with the following message:

ccv_basic.s: Assembler messages:
ccv_basic.s:203: Error: conditional branch out of range
ccv_basic.s:29086: Error: conditional branch out of range
ccomp: error: assembler command failed with exit code 1 (use -v to see invocation)

Line 203 contains the instruction

tbnz	w14, #31, .L104

and label .L104 is way down on line 28494. The full assembly output is here.

Instructions for reproducing the error with libccv on an rpi5:

  1. Download libccv stable release and unzip
  2. Enter lib/ directory
  3. In config.mk.in, set CC := ccomp and add -fstruct-passing -flongdouble to CFLAGS
  4. Run ./configure && make

I've seen the same problem with tbz in a local fork of CompCert that inserts extra code (so the functions are even larger), but so far with unmodified CompCert I've only seen it with tbnz.

My bandaid fix is to extend the expand_instruction function in aarch64/Asmexpand.ml to rewrite conditional branches to use an unconditional long jump in the branching case, as described here and here.

The code looks like this (covering all conditional branch instructions, maybe unnecessarily):

let is_cond_branch = function
  | Pbc _ | Ptbnz _ | Ptbz _ | Pcbnz _ | Pcbz _ -> true
  | _ -> false

let negate_cond = function
  | TCeq -> TCne
  | TCne -> TCeq
  | TChs -> TClo
  | TClo -> TChs
  | TCmi -> TCpl
  | TCpl -> TCmi
  | TChi -> TCls
  | TCls -> TChi
  | TCge -> TClt
  | TClt -> TCge
  | TCgt -> TCle
  | TCle -> TCgt

(** Negate a conditional branch instruction and update its target
    label to [new_tgt]. *)
let negate_cond_branch new_tgt = function
  | Pbc (cond, _) -> Pbc (negate_cond cond, new_tgt)
  | Ptbnz (sz, r, i, _) -> Ptbz (sz, r, i, new_tgt)
  | Ptbz (sz, r, i, _) -> Ptbnz (sz, r, i, new_tgt)
  | Pcbnz (sz, r, _) -> Pcbz (sz, r, new_tgt)
  | Pcbz (sz, r, _) -> Pcbnz (sz, r, new_tgt)
  | _ -> raise (Error "negate_cond_branch: expected conditional branch")

let tgt_of_branch = function
  | Pbc (_, tgt) -> tgt
  | Ptbnz (_, _, _, tgt) -> tgt
  | Ptbz (_, _, _, tgt) -> tgt
  | Pcbnz (_, _, tgt) -> tgt
  | Pcbz (_, _, tgt) -> tgt
  | _ -> raise (Error "tgt_of_branch: expected conditional branch")

(** Emit long-jump version of [instr]. *)
let expand_cond_branch instr : unit =
  let lbl = new_label () in
  emit (negate_cond_branch lbl instr);
  emit (Pb (tgt_of_branch instr));
  emit (Plabel lbl)

let expand_instruction instr =
  match instr with
  | (* ... Existing cases ... *)
  | _ when is_cond_branch instr ->
     expand_cond_branch instr
  | _ ->
     emit instr

This solution is obviously not ideal since it rewrites all conditional branches when it is very rare that they require it.

It's also not clear to me if this should handled prior to Asmexpand, within the verified part of the compiler. It looks like the semantics in aarch64/Asm.v doesn't take range limitations of conditional branches into account.

bagnalla avatar Oct 09 '25 23:10 bagnalla

Well spotted, thank you for the report.

I think the main offenders here are the tbz and tbnz instructions, since they are limited to +/- 32K. The other conditional branches have a range of +/- 1M, so it's very unlikely you would ever overflow this. (A truly huge source C function would be needed, and then CompCert would probably run out of memory while compiling it.) So, you could keep your bandaid fix, but remove all cases but Ptbz and Ptbnz. With the current code, you're paying quite a price in code size and execution time.

In the longer term, we have to decide between

  • not using the tbz and tbnz instructions, too fragile w.r.t. large source codes,

  • implement branch relaxation, which is a static analysis that detects (possibly) out-of-range relative branches and rewrites them differently.

Assemblers can implement branch relaxation themselves, and I believe gas for RISC-V does, but this cannot be relied on.

CompCert already implements branch relaxation for PowerPC, since its conditional branch instruction has a fairly low range. Maybe the PowerPC-specific implementation can be generalized to other target architectures. I'll look into that when I have time, but this may take a while.

xavierleroy avatar Oct 10 '25 16:10 xavierleroy

@xavierleroy I toyed with using cbz and cbnz on ARM Thumb2, and implemented them using an over-estimating oracle for instruction size (it's in Chamois). Not very pleasant.

monniaux avatar Oct 23 '25 10:10 monniaux

Here's an example that shows that it is at least possible to get ccomp to exceed the range (+/- 1M) of the other branch instructions (b.eq in this case, or cbnz if you change if (x == 123) to if (x)):

N = 66000

prog = """
int f(int x, int y) {
  return x + y;
}
int x;
int main() {
  if (x == 123) {
    return x;
  } else {
    int x0 = 0;
    int x1 = 1;
"""
for i in range(2, N+1):
    prog += f"    int x{i} = f(x{i-2}, x{i-1});\n"
prog += f"    return x{N};\n"
prog += "  }\n}"

with open("prog.c", "w") as f:
    f.write(prog)
$ python gen.py && time ccomp prog.c
/tmp/compcert7f7ad0.s: Assembler messages:
/tmp/compcert7f7ad0.s:39: Error: conditional branch out of range
ccomp: error: assembler command failed with exit code 1 (use -v to see invocation)

real	0m50.678s

FWIW, I've implemented branch relaxation for aarch64 in Asmexpand.ml here. It's designed to rewrite branches only when necessary (though conservatively in the presence of inline asm). I've tested it somewhat thoroughly but it may not be perfectly correct. It works for me so far.

bagnalla avatar Oct 23 '25 15:10 bagnalla

@monniaux : thanks for the Chamois plug, but branch relaxation for PowerPC was implemented 14 years ago in 14b44ae202de640f2a7a6e973e5fb912b736c002, so nothing is new under the sun.

@bagnalla: thanks for the prototype implementation. I'm still not super concerned about AArch64 function codes bigger than 1Mbytes. But I agree it's time to think about a generic implementation of branch relaxation that can be shared between several target processors.

xavierleroy avatar Oct 23 '25 17:10 xavierleroy