rr icon indicating copy to clipboard operation
rr copied to clipboard

Regression: NULL l_addr in link_map structure for vdso on 5.5.0

Open solb opened this issue 4 years ago • 3 comments

I've discovered a regression from 5.4.0 to 5.5.0. Applications that use dlopen() to obtain a dynamic linker struct link_map pointer corresponding to the vdso and observe the l_addr field will find it to be NULL. Here's a minimal application that crashes on 5.5.0 (link with -ldl):

#include <assert.h>
#include <link.h>
#include <string.h>

int main(void) {
	const struct link_map *l = dlopen(NULL, RTLD_LAZY);
	while(l && strcmp(l->l_name, "linux-vdso.so.1"))
		l = l->l_next;
	assert(l->l_addr);

	return 0;
}

It looks like the NULL originates in ld-linux.so's setup_vdso() function. When running without rr, all of the PT_LOAD entries in the program header have a p_vaddr of 0, so line 52 has no effect and line 64 is equivalent to l->l_addr = l->l_map_start. Under rr, the first PT_LOAD entry has a p_vaddr that matches l->l_map_start, so line 64 instead resets l->l_addr to NULL.

It looks like for setup_vdso() to work correctly on librrpage.so, all of its PT_LOAD entries would have to have a p_vaddr of 0, which is infeasible. And since setup_vdso() is an inline function, it cannot be interposed. Perhaps rr can just manually set the l_addr field after the dynamic linker is finished?

I'm using glibc 2.33 from Debian:

$ apt policy libc6
libc6:
  Installed: 2.33-1

It looks like this was introduced in 4be0255d, which proxies the vdso. Here's my bisection script:

#!/bin/sh

mydir="`dirname "$0"`"
set -ev
[ -e vdsoaddr ] || c99 -o vdsoaddr "$mydir/vdsoaddr.c" -ldl
[ -e vdsoaddr-0 ] && rm -rf vdsoaddr-0
git cherry HEAD dbc94c6f | grep ^+ >/dev/null && git cherry-pick -n dbc94c6f
./configure
make -j8
git reset --hard
_RR_TRACE_DIR="." bin/rr ./vdsoaddr || false

solb avatar Jan 23 '22 18:01 solb

cc @Keno

khuey avatar Jan 23 '22 19:01 khuey

We can fix this by changing the addresses in the linker script, but it would be good to know why this is an issue. l_addr is the load bias, which we don't make use of here, because it's always at a fixed address. Makes it easier to correlate addresses in the rr page by just loading up the binary in GDB.

Keno avatar Jan 23 '22 19:01 Keno

I ran into this the other day in my doctoral thesis work. Admittedly, my use case is pretty obscure: I have a custom runtime that rewrites GOT entries to conditionally intercept function calls and global variable accesses. To accomplish this, it has to parse the ELF headers of the loaded objects. I am able to work around it, but as a result my system now has code to handle this specific rr quirk. To be fair, rr has proven time and again to be indispensable, so it's a small price to pay. :) Just worried it might someday affect someone else and distract them from finding whatever gnarly bug they were chasing in their scary systems research.

solb avatar Jan 26 '22 20:01 solb