runtime: frame pointer stored past the stack pointer on arm64, resulting in inaccurate backtraces in LLDB
Go version
go version go1.22.6 darwin/arm64
Output of go env in your module/workspace:
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/larry/Library/Caches/go-build'
GOENV='/Users/larry/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/larry/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/larry/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.6/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.6/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.6'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/larry/src/go/src/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/zs/c12p66m91mq_vm_88c68lwh00000gp/T/go-build3146866747=/tmp/go-build -gno-record-gcc-switches -fno-common'
What did you do?
Built a hello world program.
➜ ~ go build -o hello hello.go
run it in a debugger.
➜ ~ lldb hello
(lldb) target create "hello"
Current executable set to '/Users/larry/hello' (arm64).
Breakpoint on a system function which is called by the go runtime before main.
(lldb) b notify_is_valid_token
Breakpoint 1: where = libsystem_notify.dylib`notify_is_valid_token, address = 0x0000000183ce75e8
(lldb) r
Process 82185 launched: '/Users/larry/hello/hello' (arm64)
Process 82185 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x00000001889b75e8 libsystem_notify.dylib`notify_is_valid_token
Target 0: (hello) stopped.
Take a backtrace
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x00000001889b75e8 libsystem_notify.dylib`notify_is_valid_token
frame #1: 0x0000000100064db4 hello`runtime.osinit_hack_trampoline.abi0 + 20
frame #2: 0x00000001000637f8 hello`runtime.asmcgocall.abi0 + 200
frame #3: 0x00000001000523fc hello`runtime.libcCall + 92
Take a look at osinit_hack_trampoline. It saves the frame pointer past the end of the stack.
(lldb) up
frame #1: 0x0000000100064db4 hello`runtime.osinit_hack_trampoline.abi0 + 20
(lldb) disas
hello`runtime.osinit_hack_trampoline.abi0:
0x100064da0 <+0>: str x30, [sp, #-0x10]!
0x100064da4 <+4>: stur x29, [sp, #-0x8]
0x100064da8 <+8>: sub x29, sp, #0x8
0x100064dac <+12>: mov x0, xzr
0x100064db0 <+16>: bl 0x10008ce2c
-> 0x100064db4 <+20>: bl 0x10008ce38
0x100064db8 <+24>: ldp x29, x30, [sp, #-0x8]
0x100064dbc <+28>: add sp, sp, #0x10
0x100064dc0 <+32>: ret
0x100064dc4 <+36>: udf #0x0
0x100064dc8 <+40>: udf #0x0
0x100064dcc <+44>: udf #0x0
Take a look at the C function it calls. It's function prelude clobbers the frame pointer that go's prelude saved.
(lldb) down
frame #0: 0x00000001889b75e8 libsystem_notify.dylib`notify_is_valid_token
libsystem_notify.dylib`notify_is_valid_token:
(lldb) disas
libsystem_notify.dylib`notify_is_valid_token:
-> 0x1889b75e8 <+0>: pacibsp
0x1889b75ec <+4>: sub sp, sp, #0x70
0x1889b75f0 <+8>: stp x22, x21, [sp, #0x40]
0x1889b75f4 <+12>: stp x20, x19, [sp, #0x50]
0x1889b75f8 <+16>: stp x29, x30, [sp, #0x60]
0x1889b75fc <+20>: add x29, sp, #0x60
0x1889b7600 <+24>: mov x19, x0
0x1889b7604 <+28>: adrp x8, 398022
0x1889b7608 <+32>: ldr x8, [x8, #0x320]
0x1889b760c <+36>: ldr x8, [x8]
....
Continue past the prelude.
(lldb) b -a 0x1889b7600
Breakpoint 2: where = libsystem_notify.dylib`notify_is_valid_token + 24, address = 0x00000001889b7600
(lldb) c
Process 82185 resuming
Process 82185 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
frame #0: 0x00000001889b7600 libsystem_notify.dylib`notify_is_valid_token + 24
libsystem_notify.dylib`notify_is_valid_token:
-> 0x1889b7600 <+24>: mov x19, x0
0x1889b7604 <+28>: adrp x8, 398022
0x1889b7608 <+32>: ldr x8, [x8, #0x320]
0x1889b760c <+36>: ldr x8, [x8]
Target 0: (hello) stopped.
Now the backtrace is no longer valid. It has lost frames because the FP got clobbered.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
* frame #0: 0x00000001889b7600 libsystem_notify.dylib`notify_is_valid_token + 24
frame #1: 0x0000000100064db4 hello`runtime.osinit_hack_trampoline.abi0 + 20
(lldb)
What did you see happen?
Backtraces were not accurate because go's function prelude stored the frame pointer past the edge of the stack.
What did you expect to see?
Accurate backtraces.
Related Issues and Documentation
- runtime: frame pointer check fails on arm64 for C-to-Go calls #59401
- cmd/link: Incorrect symbol linked in darwin/arm64 #58935 (closed)
- runtime/trace: frame pointer unwinding during C-to-Go calls can crash on amd64 #59830 (closed)
- gccgo: arm64: programs built with the -static option crashed during runtime initialization #38742
- runtime: disabling inlining breaks on arm64, ppc64 #11482 (closed)
- cmd/link: possible linker bug when using the -T flag to set the text address #28257
- cmd/compile: when cmd/compile is compiled with `-gcflags = "all=-N -l"`, the output binary panic on start #54412 (closed)
- cmd/objdump: TestDisasmExtld fails on darwin/arm64 #49700 (closed)
- cmd/objdump: autogenerated code is not dumped #37191 (closed)
- cmd/compile: stack load line numbers wrong in non-regabi backend #60673 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
The ARM64 calling convention is to store the frame pointer below the link register. As the LR is stored at 0(SP), we had to store FP at -8(SP). It is unfortunate, but we cannot change the calling convention at this point.
However, we probably can change the libc function trampolines to use a different convention. They already don't follow the Go calling convention and the Go stack unwinder probably never looks at them.
Yea, but couldn't the (FP, LR) pair be stored at the end (highest address) of the stack frame, like a normal C function would do? That wouldn't alter the calling convention would it?
couldn't the (FP, LR) pair be stored at the end (highest address) of the stack frame
User assembly code depend on LR saved at 0(SP). We can't change it now.
Note that for regular Go stacks, it is also unnecessary: Go functions know this and would not write to the very top word in the frame to clobber the caller's FP. The only time it matters is to call C functions.
Ugh, that's unfortunate.
Well if y'all ever make an ABI1 I guess it could be fixed there :/
Note that backtraces will be also be wrong when taken with sample, stackshot, ktrace, Instruments, or just about any debugging or perf tool on Mac OS.
We could conceivably change ABIinternal and leave things as is for ABI0. Not sure how tricky that would be.
It seems to me that the thing blocking fixing this is not the calling convention, but the expectations of assembly code.
couldn't we add a flag to the TEXT declaration similar to NOFRAME, that informs the assembler that it is allowed to place the FP,LR pair in the correct place? The resulting code would still be valid ABI0, and also valid according to platform backtracers. functions without the flag would get the old behavior, so nothing should break.
I don't think we need to change the convention for regular Go functions and Go assembly code. The system ABI requires FP be saved at a word below the LR, but it does not require it to be saved at a specific place in the stack frame, except the general assumption that the space below SP is scratch. Most tools generally don't care about the SP. Go generated code already doesn't clobber the saved frame pointer. It only matters when calling into C code, which can be handled specifically.
Yes, the system ABI does not require the FP,LR be saved at a specific place, but it does require that it be saved inside the functions actual stack frame, not in the red zone. The red zone can be used for scratch, yes, but it must be assumed to be clobbered every time a function is called.
Yes, it only matters when caling into C (or other non-go) code. Another way to mitigate would be to adjust SP down by 8 before calling any C function.