LibAFL icon indicating copy to clipboard operation
LibAFL copied to clipboard

fastfail fix introduced a bug -- unreachable code

Open expend20 opened this issue 3 years ago • 26 comments

fastfail fix introduced next bug, it assumes UnhandledExceptionFilter should not return, but in the case it's called for the exception which is not treated as a crash (like c++'s throw) then we need to return from UnhandledExceptionFilter and continue execution. I don't know yet, if Frida is capable of unwinding in instrumented code, if not, then it's kind of bad, but let's talk about that later.

Here is the harness which reproduces the bug:

struct MyException : public std::exception {
    const char *what() const throw() { return "C++ Exception"; }
};

extern "C" __declspec(dllexport) void __cdecl crash()
{
    *(size_t *)0 = 0;
}

extern "C" __declspec(dllexport) size_t
    FuzzMeCPPEH(const char *data, unsigned int len)
{
    try {
        printf("Throwing CPPEH ...\n");
        DoNothing(1);
        throw MyException();
        DoNothing(2);
    }
    catch (MyException &e) {
        printf("In CPPEH handler\n");
        DoNothing(3);
    }

    printf("The end of the function\n");
    if (*(size_t *)data == 0x37333331) {
        crash();
    }
    return 1;
}

example run frida_gdiplus.exe -H C:\git\boxer_cpp\build\src\tests\RelWithDebInfo\acc_test.dll -F FuzzMeCPPEH -l acc_test.dll the log:

Child process file stdio is not supported on Windows yet. Dumping to stdout instead...
spawning on cores: Cores { cmdline: "0", ids: [CoreId { id: 0 }] }
I am broker!!.
Connected to port 1337
New connection: 127.0.0.1:53743/127.0.0.1:53743
Setting core affinity to CoreId { id: 0 }
Spawning next client (id 0)
First run. Let's set it all up
We're a client, let's fuzz :)
excluding range: 0-7ff7b00c0000
excluding range: 7ff7b265f000-7ffc29f80000
excluding range: 7ffc29f95000-ffffffffffffffff
Loading file "corpus/not_kitty.png" ...
Throwing CPPEH ...
The application panicked (crashed).
Message:  internal error: entered unreachable code: handle_exception should not return
Location: C:\git\libafl_f\libafl_frida\src\windows_hooks.rs:53

Backtrace omitted.

Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
The application panicked (crashed).
Message:  called `Option::unwrap()` on a `None` value
Location: C:\git\libafl_f\libafl\src\executors\inprocess.rs:1154

Backtrace omitted.

the solution would be to call the original function

expend20 avatar Oct 12 '22 19:10 expend20

btw, if you just remove the initialize() in windows_hook.rs (no hooks of IsProcessorFeaturePresent and UnhandledExceptionFilter) you'll end up in some weird crash like this:

AccTest: loaded(2cb0.674): Unknown exception - code 00000000 (first chance)
(2cb0.674): Unknown exception - code 00000000 (!!! second chance !!!)
ntdll!RtlRaiseStatus+0x3d:
00007ffc`392ef86d 65488b0c2560000000 mov   rcx,qword ptr gs:[60h] gs:00000000`00000060=????????????????
3:015> k
 # Child-SP          RetAddr               Call Site
00 000000e2`5aacc0b0 000002a3`d92b4cd2     ntdll!RtlRaiseStatus+0x3d
01 000000e2`5aacc650 000000e2`5aaccce0     0x000002a3`d92b4cd2
02 000000e2`5aacc658 000000e2`5aacc690     0x000000e2`5aaccce0
03 000000e2`5aacc660 00000000`00000000     0x000000e2`5aacc690

which may indicate that Frida doesn't like c++ exceptions in instrumented code

expend20 avatar Oct 12 '22 19:10 expend20

so, basically calling the original function in the hook led us to this ^ issue, and the next possible steps toward solution in my opinion would be:

  • Try to investigate why Frida issues this, is that completely not supported or we just have misused Frida in a weird way? would appreciate any examples of using Frida for code instrumentations like this but I need to instrument external module
  • Redirect execution to the start of the next iteration if we experience any exception (even expected) and mark this coverage as insignificant.
    • Not really sure if that's the desired behavior, because even OutputDebugString could produce the exception if I'm not mistaken.
    • Not sure Frida allows this, if anyone has an example of something similar would appreciate it.

expend20 avatar Oct 12 '22 20:10 expend20

I can only help with the last point I’m afraid. Here’s my implementation for AFL_FRIDA_PERSISTENT_RET.

https://github.com/AFLplusplus/AFLplusplus/blob/d1e1bbc713b22d620956143634ecdf97223aa59f/frida_mode/src/persistent/persistent_x64.c#L321

This file has the inline assembly generation code to support persistent mode. The epilogue is inserted when we want to terminate an iteration early.

WorksButNotTested avatar Oct 13 '22 01:10 WorksButNotTested

I don't really have the ability to test or debug on windows. I know that the unwinding of stack/exceptions through stalked code in frida is a bit iffy at best, so it is entirely possible that this is the issue. We need to bring @oleavr into the conversation, I'm afraid.

s1341 avatar Oct 13 '22 05:10 s1341

We'll need to improve Stalker so it augments the native unwinding like it does on some of the other ABIs. This boils down to (something like) hooking an internal function to translate the JITed address back to the original address.

Stalker recompiles the code being run on the fly, and executes the recompiled copy instead. While doing this it ensures it has identical side-effects as the original code -- e.g. making sure the recompiled code for a CALL instruction results in the same return address being pushed as if the original CALL was the instruction being executed.

Without having spent any time researching this, I suspect the solution might be to hook the user-space callback that the kernel calls, and modify the CONTEXT's EIP/RIP so it looks like the exception happened in the original code, and not the recompiled version that's actually being executed.

Frida already hooks the user-space callback actually, as this is how its Exceptor API is implemented on Windows. So in theory all Stalker needs to do is call gum_exceptor_add() to register a handler which does this work. It should return TRUE if the EIP/RIP was translated (signaling that it handled the exceptions, and no other handlers should be called), and FALSE otherwise.

Anyone interested in taking a stab at implementing this? I should be able to pitch in in the not-too-distant future, just a bit too much on my plate right now. Definitely happy to assist though if anybody's keen on diving in right away.

oleavr avatar Oct 13 '22 14:10 oleavr

Thanks @oleavr for the response!

Well, without promising anything I could try.

Maybe we could discuss in brief, what is the plan. In theory when exception happend in instrumented code, you need to walk all the current stack and replace all the retun addresses. Only the latest one is not enough, because there could be chained exceptoins and unwinds. So you change the pointers in the stack from the instrumented code, to the original code and continue execution. In that way windows will be able to unwind the stack properly, but you would need to expect the original code to continue execution pretty much anywhere. Which leads us to the next question.

How do you hook the original code? By removing the executable attribute from the memory and handling the exception, right?

Just before I dive into the code, maybe there are additional context/information which may be beneficial?

expend20 avatar Oct 14 '22 06:10 expend20

Take a quick look at the Linux exception handling support. I’d guess the approach would be pretty similar.

https://github.com/frida/frida-gum/blob/187492a38f0a2ecef3915765a2917e53b0a6e303/gum/backend-x86/gumstalker-x86.c#L876

Rather than modifying the stack itself, it’s likely to be a case of hooking some functions involved in the exception handling process and modifying the arguments or return values.

WorksButNotTested avatar Oct 14 '22 07:10 WorksButNotTested

In theory when exception happend in instrumented code, you need to walk all the current stack and replace all the retun addresses.

There's luckily nothing that needs replacing on the stack -- Stalker ensures that the side-effects of the recompiled code are identical to the original code. This means that anything pushed on the stack will be exactly as if Stalker wasn't there.

When something goes wrong though, the exception details coming from the kernel will have an instruction pointer that is in instrumented code. This needs to be translated back to the corresponding original address. Implementing that may require adding some extra book-keeping in Stalker. However, I suspect there's a simpler solution -- by hooking the exception handling internals similar to what we do on Linux, so I would recommend taking a look at the code that @WorksButNotTested pointed out, and compare the Windows implementation's internals to that.

oleavr avatar Oct 17 '22 21:10 oleavr

Exception handling in Linux is explained in excruciating detail here. https://monkeywritescode.blogspot.com/p/c-exceptions-under-hood.html

It’s a very comprehensive and thorough explanation.

WorksButNotTested avatar Oct 17 '22 22:10 WorksButNotTested

thanks guys, I'll try to allocate time on this

expend20 avatar Oct 18 '22 16:10 expend20

Hello, I tried to reproduce it on frida-gum's example, and it... just worked. So, Frida actually supports C++ exceptions on Windows, at least on this minimalistic example.

Attaching the reproducer. frida-ex-repro.zip

Now, the most interesting question, why doesn't it work on libafl? No clue yet.

expend20 avatar Oct 26 '22 19:10 expend20

I would test playing around with excluded ranges. Try including everything and see if it works. Then try excluding everything. I suspect the issue might relate to an exception occurring in an excluded range with a handler in an included region.

WorksButNotTested avatar Oct 26 '22 20:10 WorksButNotTested

You were right, in the reproducer if I call stalker.exclude() for every virtual address except the instrumented module, it stops working.

I wonder why do we have both: excluded ranges and explicit check in the instrumentation callback? Sound like if we have a check in the instrumentation callback, we don't need any exclude ranges, and vice versa, mutually exclusive in short.

expend20 avatar Oct 27 '22 17:10 expend20

Maybe the same reason as for AFL++?

https://github.com/AFLplusplus/AFLplusplus/blob/340647c5f1dda67957cf9f85b2af9e9ef7fb28af/frida_mode/src/instrument/instrument.c#L175

WorksButNotTested avatar Oct 27 '22 17:10 WorksButNotTested

Doesn't that just mean we should include the LibAFL error Handler in Stalker and that may be enough? Or maybe I'm not understanding something correctly

domenukk avatar Oct 27 '22 17:10 domenukk

I think if we excluded irrelevant ranges then the program doesn't even jump into these callbacks when they are executed https://github.com/AFLplusplus/LibAFL/blob/40269a578b9f8a7a6b6d7e014f6fbf1e186e7dcd/libafl_frida/src/helper.rs#L249 therefore we can achieve more speed (?)

tokatoka avatar Nov 03 '22 04:11 tokatoka

therefore we can achieve more speed (?)

Instrumentation phase is one time cost, so it potentially will lead to only faster startup, fuzzing speed is not going to be affected.

I think we need solid understanding why these excluded ranges needed for Windows in the first place, if it's just an artifact from porting code from Linux, then perhaps we can move it behind cfg(not(windows)) and this will resolve the exceptions issue.

expend20 avatar Nov 03 '22 08:11 expend20

@s1341 do you have any ideas? can we remove these excludes ?

tokatoka avatar Nov 03 '22 08:11 tokatoka

Meanwhile I can try if Frida's exeptor can be of help

expend20 avatar Nov 03 '22 08:11 expend20

The issue is also present on macOS.

It's a fundamental issue with how frida stalker 'jits' the code, and how exception unwinding works on each platform. The correct solution is to implement fixes upstream in frida.

s1341 avatar Nov 03 '22 12:11 s1341

cough @oleavr

domenukk avatar Nov 03 '22 12:11 domenukk

Instrumentation phase is one time cost, so it potentially will lead to only faster startup, fuzzing speed is not going to be affected.

as long as I checked with frida_libpng the speed dropped from 33k/sec to 22k/sec if I removed stalker.exclude(). (on linux) so i'd say it's still meaningful for performance reason

tokatoka avatar Nov 04 '22 01:11 tokatoka

as long as I checked with frida_libpng the speed dropped from 33k/sec to 22k/sec if I removed stalker.exclude(). (on linux) so i'd say it's still meaningful for performance reason

wow, that's weird as for me, what could be the reason?

expend20 avatar Nov 04 '22 08:11 expend20

If we tell FRIDA stalker to exclude a range, then when we call a function in it, it will push a return address onto the stack to jump back to FRIDA and then jump to the excluded function. FRIDA will only get control back when that function returns, no matter if that function in turn calls other functions not in excluded ranges.

In AFL++, since we don’t want to lose control of execution before we’ve even started our target, we don’t instruct FRiDA to exclude any ranges until we hit the entrypoint. This means that we will be instrumenting blocks which may actually be excluded until we hit the entrypoint and these will be cached for later use.

Backpatching (https://medium.com/@oleavr/anatomy-of-a-code-tracer-b081aadb0df8) means that at times transition between instrumented blocks can happen without transitioning back to FRIDA to find the next block (a significant performance improvement). Thus we won’t check for an excluded range when this happens. So we may actually run instrumented code for excluded ranges for short periods. Therefore by checking for exclusions when generating the instrumented code, we can skip emitting coverage code at the start of these blocks to avoid the additional overhead and polluting the coverage map when they run.

The issue with exception handling is conceptually quite simple, if execution transitions from an included range to an excluded range and encounters and triggers an exception, then if that exception is handled back in the included range we hit a problem.

When running instrumented code, we can replace any RET instruction with alternative code. Therefore, we can continue to put genuine return instructions on the stack and have the instrumented code perform any translation when we encounter the RET (by substituting different instructions) to work out where to branch to next.

However, for excluded ranges we vector to the original code itself, this is obviously much more performant. But as we are executing real code, when we hit the final RET instruction we will jump to whatever address is at the top of the stack. Therefore to get back into stalker and instrumented code, we cannot simply put the genuine return address there. We have to put an address which returns back into FRIDA or an instrumented block.

This causes problems with exception handlers as they walk the stack to find the exception handlers and expect to find the correct return addresses in there. For Linux, we hook the unwinding mechanism to correct these addresses. I expect a similar approach may be needed for other operating systems.

WorksButNotTested avatar Nov 04 '22 09:11 WorksButNotTested

So for now can we just remove the exclude() temporarily for windows..?

tokatoka avatar Nov 25 '22 09:11 tokatoka

I would make it the default to skip telling FRIDA to exclude ranges for all platforms and architectures except for Linux on x86/64. But I would make it configurable so that users with targets without exception handling can re-enable it. I would also have it print a warning about what it is doing and information on how to re-configure it.

So the default behaviour ensures everything works, but at the expense of performance for many targets.

WorksButNotTested avatar Nov 25 '22 16:11 WorksButNotTested