sentry-dotnet icon indicating copy to clipboard operation
sentry-dotnet copied to clipboard

Seeing unhandled SIGABRT xamarin_unhandled_exception_handler crashes on MAUI iOS

Open MichaelLHerman opened this issue 1 year ago • 3 comments

Package

Sentry

.NET Flavor

.NET

.NET Version

8.0

OS

iOS

SDK Version

4.10.1

Self-Hosted Sentry Version

No response

Steps to Reproduce

Unsure of how to reproduce

https://cricket-health-ml.sentry.io/issues/5858865093/events/?cursor=0%3A50%3A0&environment=production&project=4507788345868288&referrer=issue-stream&statsPeriod=14d&stream_index=3

Seeing a bunch of SIGABRT xamarin_unhandled_exception_handler (Signal 6, Code 0) unhandled crashes.
From looking at Sentry code, it appears these are supposed to be filtered out client side but I don't see corresponding managed crashes being reported (when I search by app.device or user.username tags, I don't see other managed crashes) so I have no way to look into the underlying cause.

Image

Expected Result

Should see managed crashes, and not these native crashes

Actual Result

See plenty of instances of SIGABRT xamarin_unhandled_exception_handler (Signal 6, Code 0) with no corresponding managed crash

Image

MichaelLHerman avatar Oct 14 '24 18:10 MichaelLHerman

From looking at Sentry code, it appears these are supposed to be filtered out client side but I don't see corresponding managed crashes being reported

Hey @MichaelLHerman, reading Matt's comment in that Sentry snippet you referenced:

When we have an unhandled managed exception...

Since you're not seeing any managed exceptions, my assumption would be that this SIGABRT is coming from unmanaged (not managed) code. What does the full stack trace look like (rather than just the "Most relevant" - which seems to be showing just In App frames)?

jamescrosswell avatar Oct 14 '24 21:10 jamescrosswell

@jamescrosswell

The only frames that are omitted by "Most relevant" are pthread_kill and abort. Since the frames below it are mono/xamarin I assume this is a managed exception.

Thread 0 name: tid_103 Crashed:
0   libsystem_kernel.dylib          0x3b9ee42ec         __pthread_kill
1   libsystem_pthread.dylib         0x3e1acfc08         pthread_kill
2   libsystem_c.dylib               0x338479b9c         abort
3   CadabraMobile                   0x205241d20         xamarin_unhandled_exception_handler (runtime.m:1133)
4   CadabraMobile                   0x2054459dc         mono_invoke_unhandled_exception_hook (exception.c:1219)
5   CadabraMobile                   0x2054e0408         [inlined] mono_jit_exec_internal (driver.c:1374)
6   CadabraMobile                   0x2054e0408         mono_jit_exec (driver.c:1314)
7   CadabraMobile                   0x20524a2f0         xamarin_main (monotouch-main.m:495)
8   CadabraMobile                   0x205548ef4         main (main.arm64.mm:376)
9   <unknown>                       0x1c3b63154         <redacted>

MichaelLHerman avatar Oct 15 '24 00:10 MichaelLHerman

@MichaelLHerman I agree. I don't have any theories as to why this might be happening though. If we could reproduce the problem, it would be possible to look into it further.

jamescrosswell avatar Oct 16 '24 03:10 jamescrosswell

This has been fixed via: https://github.com/getsentry/sentry-dotnet/issues/3776

aritchie avatar Feb 21 '25 17:02 aritchie

@aritchie I'm still experiencing this using Sentry 5.1.1

Image

https://github.com/dotnet/runtime/blob/3c00af9d20e46f102f2beb14a709f488d4ac84a1/src/mono/mono/mini/driver.c#L1373 https://github.com/dotnet/runtime/blob/3c00af9d20e46f102f2beb14a709f488d4ac84a1/src/mono/mono/metadata/exception.c#L1219 https://github.com/dotnet/macios/blob/main/runtime/runtime.m#L1134

albyrock87 avatar Mar 04 '25 16:03 albyrock87

@albyrock87 We haven't updated docs for the feature yet. We don't want to turn these errors of by default for users. This is currently only released to iOS and is coming to Android in the next week or two.

Example:

services.UseSentry(options => {
#if IOS
    options.SuppressSegfaults = true;
#endif
});

aritchie avatar Mar 04 '25 16:03 aritchie

@aritchie what I understand from your message is that Sentry is reporting duplicate crashes and we can suppress the duplicates using the option above.

If that was true, I should see number of segfaults <= number of managed crashes, but as you can see from this screenshot, the number of segfaults impacted users is 3 while the number of managed crashes impacted users is 2.

Image

From the chart you can also see that there's no managed crash happening at the same time of a segfault.

This brings me to the conclusion that Sentry is not reporting managed crashes correctly.

Am I missing something?

albyrock87 avatar Mar 04 '25 17:03 albyrock87

@albyrock87 It's hard to say from a screenshot. The nature of what was fixed was to filter out duplicate filings between native & dotnet. If you're not seeing the same amount of managed crashes, perhaps one is truly a native issue?

This brings me to the conclusion that Sentry is not reporting managed crashes correctly

This is probably a bit of a jump conclusion. We would have a lot more reports if this was an issue.

@jamescrosswell Your thoughts on this?

aritchie avatar Mar 04 '25 17:03 aritchie

@aritchie we only have these two crashes in this moment in this specific release on iOS, so that screenshot includes all the numbers related to iOS.

The time chart shows how there's no managed exception reported at the same time.

I see ~~three~~ two possible scenarios:

  • This is a native error, but the stack trace leads to the code that usually handles unhandled managed exceptions, so the stack trace is useless to understand the native crash
  • This is a managed exception, and it is not being reported as one with the appropriate C# stack trace
  • ~~This is being reported as unhandled but in reality it is handled (we have plenty of handled managed exceptions)~~ this is not possible as there's a _pthread_kill which in theory represents the end of the application process

Either way, we don't have a clue of what is crashing the application.

albyrock87 avatar Mar 04 '25 17:03 albyrock87

@albyrock87 I think you might be seeing the opposite problem, which another user reported here:

  • https://github.com/getsentry/sentry-dotnet/issues/4012#issuecomment-2695739938

Historically Sentry captured native exceptions (and suppressed SIGABRT xamarin_unhandled_exception_handler crashes) using the UnhandledException hook.

However, that hook doesn't fire correctly for AOT compiled iOS applications (see discussion here). As such, PR https://github.com/getsentry/sentry-dotnet/pull/3909 uses the MarshalManagedException hook instead... which does fire in AOT compiled iOS applications.

It looks like there might be compiler/linker configurations for which this hook doesn't always fire though. The user who reported issue #4012 indicated a potential workaround might be to use the FirstChanceException hook.

I haven't had a chance to look into this and FirstChanceException is a bit tricky (it fires before the runtime checks for exception handlers)... So some head scratching would be required in order to work out how/whether we could use that as well.

jamescrosswell avatar Mar 04 '25 20:03 jamescrosswell

@jamescrosswell I see, so the following can't be done

https://github.com/getsentry/sentry-dotnet/blob/c20d16245c7e2d2f9056a49780593a391e59eda1/src/Sentry/Platforms/Cocoa/SentrySdk.cs#L13-L17

because it would create an assertion

https://github.com/dotnet/macios/blob/907081f787315704a01c940cf28b46b47db23df0/runtime/runtime.m#L2452-L2462

but what if we simply capture the exception directly?

ObjCRuntime.Runtime.MarshalManagedException += (_, args) =>
{
    if (args.ExceptionMode != ObjCRuntime.MarshalManagedExceptionMode.UnwindNativeCode)
    {
        AppDomainAdapter.Instance.RaiseUnhandledException(args.Exception);
    }
};

Idk if AppDomainAdapter.Instance is the right place, but you get the idea.

Have I got this right by supposing that UnwindNativeCode is the only mode which triggers the UnhandledException hook?

https://learn.microsoft.com/en-us/previous-versions/xamarin/ios/platform/exception-marshaling#events

Just asking because we don't want duplicate crash reports obviously.

Also this should only be done in .NET8 right? So my issue should be gone when upgrading to .NET9, am I correct?

https://github.com/dotnet/macios/issues/15252#issuecomment-2349301905

On top of this change we'd also suppress the native exceptions I'm seeing.

albyrock87 avatar Mar 05 '25 10:03 albyrock87

@albyrock87

but what if we simply capture the exception directly?

I'd have to check but I think that would result in capturing exceptions even when these were wrapped in a try/catch block... which is not what we want.

Have I got this right by supposing that UnwindNativeCode is the only mode which triggers the UnhandledException hook?

That's probably a question for the dotnet runtime team (or you could read through the source code).

Also this should only be done in .NET8 right? So my issue should be gone when upgrading to .NET9, am I correct? https://github.com/dotnet/macios/issues/15252#issuecomment-2349301905

From that comment: It's still a problem when using CoreCLR (on macOS or when using NativeAOT). So no, we still can't rely on that event hook in net9.0. We need another mechanism (either instead or as well).

jamescrosswell avatar Mar 05 '25 22:03 jamescrosswell