Regression: NULL l_addr in link_map structure for vdso on 5.5.0
I've discovered a regression from 5.4.0 to 5.5.0. Applications that use dlopen() to obtain a dynamic linker struct link_map pointer corresponding to the vdso and observe the l_addr field will find it to be NULL. Here's a minimal application that crashes on 5.5.0 (link with -ldl):
#include <assert.h>
#include <link.h>
#include <string.h>
int main(void) {
const struct link_map *l = dlopen(NULL, RTLD_LAZY);
while(l && strcmp(l->l_name, "linux-vdso.so.1"))
l = l->l_next;
assert(l->l_addr);
return 0;
}
It looks like the NULL originates in ld-linux.so's setup_vdso() function. When running without rr, all of the PT_LOAD entries in the program header have a p_vaddr of 0, so line 52 has no effect and line 64 is equivalent to l->l_addr = l->l_map_start. Under rr, the first PT_LOAD entry has a p_vaddr that matches l->l_map_start, so line 64 instead resets l->l_addr to NULL.
It looks like for setup_vdso() to work correctly on librrpage.so, all of its PT_LOAD entries would have to have a p_vaddr of 0, which is infeasible. And since setup_vdso() is an inline function, it cannot be interposed. Perhaps rr can just manually set the l_addr field after the dynamic linker is finished?
I'm using glibc 2.33 from Debian:
$ apt policy libc6
libc6:
Installed: 2.33-1
It looks like this was introduced in 4be0255d, which proxies the vdso. Here's my bisection script:
#!/bin/sh
mydir="`dirname "$0"`"
set -ev
[ -e vdsoaddr ] || c99 -o vdsoaddr "$mydir/vdsoaddr.c" -ldl
[ -e vdsoaddr-0 ] && rm -rf vdsoaddr-0
git cherry HEAD dbc94c6f | grep ^+ >/dev/null && git cherry-pick -n dbc94c6f
./configure
make -j8
git reset --hard
_RR_TRACE_DIR="." bin/rr ./vdsoaddr || false
cc @Keno
We can fix this by changing the addresses in the linker script, but it would be good to know why this is an issue. l_addr is the load bias, which we don't make use of here, because it's always at a fixed address. Makes it easier to correlate addresses in the rr page by just loading up the binary in GDB.
I ran into this the other day in my doctoral thesis work. Admittedly, my use case is pretty obscure: I have a custom runtime that rewrites GOT entries to conditionally intercept function calls and global variable accesses. To accomplish this, it has to parse the ELF headers of the loaded objects. I am able to work around it, but as a result my system now has code to handle this specific rr quirk. To be fair, rr has proven time and again to be indispensable, so it's a small price to pay. :) Just worried it might someday affect someone else and distract them from finding whatever gnarly bug they were chasing in their scary systems research.