libfaketime icon indicating copy to clipboard operation
libfaketime copied to clipboard

unexplained crash in getrandom within gpgsm

Open dkg opened this issue 4 years ago • 10 comments

I'm on a debian amd64 system with gpgsm 2.2.27, with faketime 0.9.9+git20210226-1 (from debian/experimental).

bad-address-crash.zip has a directory bad-address-crash/ with three files in it:

  • ca.crt
  • bob.encrypt.crt
  • cmsencrypt

I can trigger a crash of gpgsm if I invoke the cmsencrypt script while LD_PRELOADing a libfaketime library that is built with FAKE_RANDOM and INTERCEPT_SYSCALL:

dkg@alice:/tmp/cdtemp.LvoJ3j/bad-address-crash$ echo test | ./cmsencrypt bob.encrypt.crt > /dev/null
gpgsm: issuer certificate (#/CN=Sample LAMPS Certificate Authority) not found
gpgsm: encrypted data created
dkg@alice:/tmp/cdtemp.LvoJ3j/bad-address-crash$ echo test | LD_PRELOAD=/usr/lib/x86_64-linux-gnu/faketime/libfaketime.so.1 ./cmsencrypt bob.encrypt.crt > /dev/null
gpgsm: issuer certificate (#/CN=Sample LAMPS Certificate Authority) not found
Fatal: unexpected error from getrandom: Bad address
./cmsencrypt: line 34: 1071642 Aborted                 "${gpgsm[@]}" "${grips[@]}" --encrypt
dkg@alice:/tmp/cdtemp.LvoJ3j/bad-address-crash$ 

I don't understand why this is happening.

Note that the crash only happens when FAKERANDOM_SEED is unset, and the library is LD_PRELOADed. If i also set FAKERANDOM_SEED then the program succeeds as expected.

dkg avatar Feb 26 '21 05:02 dkg

This seems to be indirectly related to syscall() intercepting and then going for a real_syscall(). At least gpgsm here triggers the syscall() interception, and since FAKERANDOM_SEED is not set, it falls through to the bypass part. real_syscall() returns -1 with errno being set to 14 (EFAULT: "The address referred to by buf is outside the accessible address space."). Some sort of memory area protection on gpgsm's side?

wolfcw avatar Feb 26 '21 20:02 wolfcw

If the memory area is protected by gpgsm, when libfaketime doesn't intercept it should fail too, though, right?

it occurred to me that maybe the syscall argument mangling/demangling was failing on addresses from the heap instead of the stack, but the tests i added in #311 seem to rule out that theory. so i'm still a bit stuck figuring this out.

dkg avatar Mar 02 '21 17:03 dkg

Well, just as an observation: On my machine, sizeof(int) is 4, while sizeof(void *) is 8.

gpgsm apparently requests 90 random bytes from getrandom via syscall(). In your syscall bypass, when I output the values of a[i] for i = 0, 1, 2 , gpgsm's 90 value is always in a[1]. So, the memory address is in a[0], which is a 4-byte int here, whereas the real memory address would be 8 bytes long. From your comment, I'd assume an 8-byte value is supposed to be chopped into two 4-byte (int) pieces, but this seems not be the case somehow.

When I change

https://github.com/wolfcw/libfaketime/blob/0e6b1b24603948dab13f165f03a75e357e44d518/src/libfaketime.c#L3796

from int to long, gpgsm does not crash for me anymore, but I'm not sure what other implications that may have. You have chosen the vararg_promotion_t quite specifically.

From what it looks like, va_arg() when called with an int as second parameter ("type"), seems to happily return an int by truncating the original parameter, and the next va_arg() call moves on to the next parameter instead of returning the next piece of the previously truncated parameter. Not sure about that yet, just an observation.

On a closer look, it might also be that case that the loop

https://github.com/wolfcw/libfaketime/blob/0e6b1b24603948dab13f165f03a75e357e44d518/src/libfaketime.c#L3800

simply overwrites parameters that have spilled over. So the memory address ends up in a[0] AND a[1] first, but then in the next round of the for-loop, a[1] is overwritten with the second parameter (the value 90), thus effectively truncating the 8-byte address into a 4-byte one (from real_syscall()'s point of view, it might look like an 8-byte address, and it then uses the flags as buflen parameter; again, not sure here, but at least might explain the EFAULT with invalid address).

wolfcw avatar Mar 02 '21 18:03 wolfcw

thanks for the extra debugging, @wolfcw -- i admit that i'm in over my head here. debian's platform guide suggests that every platform in debian treats long as the same size as void*.

This also means that it will vary between 32-bit and 64-bit platforms.

the draft for the C standard from 2017 (found via the iso-9899.info wiki) page 197 (§7.16) seems relevant here, as it defines how va_args is supposed to work, but i'm still having difficulty thinking through how the promotion rules must apply here.

Maybe it's the C promotion rules generally (not just varargs promotion rules) that need to be considered? for example, a float is promoted to a double when the prototype uses an ellipsis.

dkg avatar Mar 03 '21 00:03 dkg

Ah! i found the problem with my attempt at testing the syscall passthrough, which is that we were diverting syscall(__NR_clock_gettime) to real_clock_gettime instead of passing it through to real_syscall. with #312 applied, both snippets/syscall_clock_gettime and snippets/syscall_clock_gettime_heap fail when LD_PRELOADing, but FAKETIME unset.

dkg avatar Mar 03 '21 01:03 dkg

As one more observation in the context of running gpgsm:

The second call to va_arg() always yields the value 90. Which means pretty much that va_arg(ap, int) somehow magically does not simply take the next 4 (= sizeof(int)) bytes from stack and returns them as int, but that each call to va_arg() hops from one parameter to the next, respecting their true sizes.

Given this behavior, I don't think passing the syscall() parameters through by chopping them into int-sized pieces via va_arg() can work.

When I thought about syscall() interception two years ago (and gave up on it), I came to the conceptual conclusion (I never implemented as much as you did recently) that the only clean way to handle it would be to individually treat each syscall-number by adding their individual signatures, pretty much like we do for getrandom and clock_gettime syscalls presently. I never went on with that, given that it's pretty dull and bloating, and needs maintenance whenever more syscalls are added (or existing ones are changes, which probably rarely happens).

wolfcw avatar Mar 03 '21 07:03 wolfcw

The more i think about this, the more i think it has to do with reversing va_arg and less to do with syscall specifically. This is because syscall itself is just another variadic function in libc, like printf.

Sadly, it looks like the va_arg implementation, even if we manage to reverse it, is a part of the defined system ABI, and might be subtly different from platform to platform. I don't know of any formal attempts to characterize the variadic argument ABI on a given platform, let alone any systematic attempt to catalog different platform characterizations, though i did find a half-completed project which in turn pointed to a an x86_64 processor-specific ABI about variable arguments to a function (see §3.5.7). This ABI doc suggests that certain arguments (floats and doubles in particular) can be passed in registers for variadic functions, and that is indeed borne out by my manual examination of the stack when passing args of different types to a variadic function. The floats aren't on the stack, but the ints, longs, and pointers are!

The i386-equivalent ABI spec (§2.2.4) is significantly less sophisticated, and might just use the stack for everything.

Fortunately, i don't know of any syscall that expects a float argument (though i haven't been able to turn up an exhaustive list or a guarantee of that either).

Given this ugliness, the cleanest possible fix for the long-term would be if libc would expose the function. I've asked glibc maintainers to do that, we'll see if they respond.

In the meantime, i suggest that we go with long as vararg_promotion_t, and i'll work on adding a test that tries to fail if that's somehow incorrect for pointer or integer argument types, and have that test run automatically if -DINTERCEPT_SYSCALL is set, so that the test suite will fail if the platform doesn't support it appropriately.

dkg avatar Mar 03 '21 21:03 dkg

Thanks for your initiative on the glibc side. Having a vsyscall would be a simple and elegant solution for the issue here, and if the idea gets rejected, we at least know we need to solve it on the language level in the long run.

Given the observed behavior of va_arg(), I'm fine with using long as vararg_promotion_t. It shouldn't have adverse effects and cover most (if not all) arguments passed through to real_syscall(). Thanks!

wolfcw avatar Mar 04 '21 18:03 wolfcw

as noted on #313, all of the debian 32-bit architectures fail the confim_variadic_promotion test except for i386.

I tried to do some experimentation on abel,debian.org, an armhf (32-bit arm with hardware floating point) to see whether i could debug this further, and got this result (failures with variadic_promotion_t set to each of long, int, long long, and double):

(sid_armhf-dchroot)dkg@abel:~/src/faketime/faketime/test$ make clean && CFLAGS='-DINTERCEPT_SYSCALL' make confirm_variadic_promotion
gcc -c -o variadic/main.o -DINTERCEPT_SYSCALL -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/main.c
gcc -c -o variadic/outer.o -DINTERCEPT_SYSCALL -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/outer.c
gcc -c -o variadic/inner.o -DINTERCEPT_SYSCALL -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/inner.c
gcc -o variadic_promotion -DINTERCEPT_SYSCALL -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/main.o variadic/outer.o variadic/inner.o
./variadic_promotion
Round 0 success: c: 0x3 s: 0x4ff wc: 0x50000fe i: 0x60000fd wi: 0x70000fc

Round 1: return values differ (outer: 73 inner: 78)
Round 1 strings differ:
 - outer: l: 0x80000fb ll: 0xfa09000000 ptr: 0x9000000 pd: 0xa0000f9 sz: 0xb0000f9

 - inner: l: 0x80000fb ll: 0x9000000000000fa ptr: 0xa0000f9 pd: 0xb0000f9 sz: 0xc0000f8

make: *** [Makefile:42: confirm_variadic_promotion] Error 2
(sid_armhf-dchroot)dkg@abel:~/src/faketime/faketime/test$ make clean && CFLAGS='-DINTERCEPT_SYSCALL -Dvariadic_promotion_t=int' make confirm_variadic_promotion
gcc -c -o variadic/main.o -DINTERCEPT_SYSCALL -Dvariadic_promotion_t=int -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/main.c
gcc -c -o variadic/outer.o -DINTERCEPT_SYSCALL -Dvariadic_promotion_t=int -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/outer.c
gcc -c -o variadic/inner.o -DINTERCEPT_SYSCALL -Dvariadic_promotion_t=int -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/inner.c
gcc -o variadic_promotion -DINTERCEPT_SYSCALL -Dvariadic_promotion_t=int -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/main.o variadic/outer.o variadic/inner.o
./variadic_promotion
Round 0 success: c: 0x3 s: 0x4ff wc: 0x50000fe i: 0x60000fd wi: 0x70000fc

Round 1: return values differ (outer: 73 inner: 78)
Round 1 strings differ:
 - outer: l: 0x80000fb ll: 0xfa09000000 ptr: 0x9000000 pd: 0xa0000f9 sz: 0xb0000f9

 - inner: l: 0x80000fb ll: 0x9000000000000fa ptr: 0xa0000f9 pd: 0xb0000f9 sz: 0xc0000f8

make: *** [Makefile:42: confirm_variadic_promotion] Error 2
(sid_armhf-dchroot)dkg@abel:~/src/faketime/faketime/test$ make clean && CFLAGS='-DINTERCEPT_SYSCALL -Dvariadic_promotion_t="long long"' make confirm_variadic_promotion
gcc -c -o variadic/main.o -DINTERCEPT_SYSCALL -Dvariadic_promotion_t="long long" -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/main.c
gcc -c -o variadic/outer.o -DINTERCEPT_SYSCALL -Dvariadic_promotion_t="long long" -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/outer.c
gcc -c -o variadic/inner.o -DINTERCEPT_SYSCALL -Dvariadic_promotion_t="long long" -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/inner.c
gcc -o variadic_promotion -DINTERCEPT_SYSCALL -Dvariadic_promotion_t="long long" -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/main.o variadic/outer.o variadic/inner.o
./variadic_promotion
Round 0: return values differ (outer: 51 inner: 57)
Round 0 strings differ:
 - outer: c: 0x0 s: 0x3 wc: 0x4ff i: 0x50000fe wi: 0x60000fd

 - inner: c: 0x3 s: 0x4ff wc: 0x50000fe i: 0x60000fd wi: 0x70000fc

Round 1: return values differ (outer: 73 inner: 78)
Round 1 strings differ:
 - outer: l: 0xb0000f9 ll: 0x9000000080000fb ptr: 0xfa pd: 0x9000000 sz: 0xa0000f9

 - inner: l: 0x80000fb ll: 0x9000000000000fa ptr: 0xa0000f9 pd: 0xb0000f9 sz: 0xc0000f8

make: *** [Makefile:42: confirm_variadic_promotion] Error 4
(sid_armhf-dchroot)dkg@abel:~/src/faketime/faketime/test$ make clean && CFLAGS='-DINTERCEPT_SYSCALL -Dvariadic_promotion_t="double"' make confirm_variadic_promotion
gcc -c -o variadic/main.o -DINTERCEPT_SYSCALL -Dvariadic_promotion_t="double" -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/main.c
gcc -c -o variadic/outer.o -DINTERCEPT_SYSCALL -Dvariadic_promotion_t="double" -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/outer.c
gcc -c -o variadic/inner.o -DINTERCEPT_SYSCALL -Dvariadic_promotion_t="double" -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/inner.c
gcc -o variadic_promotion -DINTERCEPT_SYSCALL -Dvariadic_promotion_t="double" -std=gnu99 -Wall -DFAKE_STAT -Werror -Wextra  variadic/main.o variadic/outer.o variadic/inner.o
./variadic_promotion
Round 0: return values differ (outer: 51 inner: 57)
Round 0 strings differ:
 - outer: c: 0x0 s: 0x3 wc: 0x4ff i: 0x50000fe wi: 0x60000fd

 - inner: c: 0x3 s: 0x4ff wc: 0x50000fe i: 0x60000fd wi: 0x70000fc

Round 1: return values differ (outer: 73 inner: 78)
Round 1 strings differ:
 - outer: l: 0xb0000f9 ll: 0x9000000080000fb ptr: 0xfa pd: 0x9000000 sz: 0xa0000f9

 - inner: l: 0x80000fb ll: 0x9000000000000fa ptr: 0xa0000f9 pd: 0xb0000f9 sz: 0xc0000f8

make: *** [Makefile:42: confirm_variadic_promotion] Error 4
(sid_armhf-dchroot)dkg@abel:~/src/faketime/faketime/test$ 

dkg avatar Mar 07 '21 20:03 dkg

these are pretty disturbing results. in particular, it looks to me like 64-bit arguments that are passed on the stack might have each 32-bit word swapped or something. i don't know how to resolve it, other than by not setting INTERCEPT_SYSCALL on these different platforms.

It occurs to me that someone™ could try to read the syscalls(2) manpage for the list of all syscalls in a modern kernel, and then follow the individual manpages to the definitions of the arguments for each function, to get a list of different function signatures to enhance the test further -- maybe if there are no 64-bit arguments to syscall on that platform, it's not necessary to include a long long test, for example.

dkg avatar Mar 07 '21 20:03 dkg