sentry-java icon indicating copy to clipboard operation
sentry-java copied to clipboard

SIGSEGV when user interaction instrumentation is enabled

Open OlivierGenez opened this issue 1 year ago • 27 comments


❗ EDIT by the maintainers:

  • The issue has been fixed by Google, see issue on the ART issue tracker: https://issuetracker.google.com/issues/361129298#comment7
  • Google's rollout plans are currently not communicated, but it is to be expected that the fix will be rolled out with the next system/security updates, similar to how the code was rolled out that caused the issue with the August/September updates
  • In the mean time you can mitigate this issue by deactivating User Interaction Tracing and/or Profiling (see snippets below), which means that Sentry Profiler will not start the crashing method-tracer from Android Tracer any more. (Note that other code in your app might still do that and cause the crashes unrelated to Sentry)
options.isEnableUserInteractionTracing = false
options.profilesSampleRate = 0.0

Integration

sentry-android

Build System

Gradle

AGP Version

8.3.2

Proguard

Disabled

Version

7.12.1

Steps to Reproduce

My team has observed an increase in this type of crashes in Sentry/Android vitals with the latest update of our app:

Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0x<sanitized>, nullptr=(null)) 

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 28018 >>> <Application ID redacted> <<<

backtrace:
  #00  pc 0x0000000000058290  /apex/com.android.runtime/lib64/bionic/libc.so (__strlen_aarch64+16)
  #01  pc 0x00000000005b510c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpState(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, art::Thread const*, int)+556)
  #02  pc 0x00000000005b487c  /apex/com.android.art/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, unwindstack::AndroidLocalUnwinder&, bool, bool) const+52)
  #03  pc 0x00000000005b6814  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+216)
  #04  pc 0x000000000054eeb0  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*, bool)+684)
  #05  pc 0x00000000005b6148  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+292)
  #06  pc 0x0000000000933e24  /apex/com.android.art/lib64/libart.so (art::AbortState::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+204)
  #07  pc 0x000000000093023c  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+712)
  #08  pc 0x00000000000160fc  /apex/com.android.art/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+80)
  #09  pc 0x00000000000156d0  /apex/com.android.art/lib64/libbase.so (android::base::LogMessage::~LogMessage()+516)
  #10  pc 0x00000000005b74ec  /apex/com.android.art/lib64/libart.so (art::Thread::~Thread()+1512)
  #11  pc 0x000000000030b2b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+708)
  #12  pc 0x000000000063eec8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+2208)
  #13  pc 0x000000000063e618  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallbackWithUffdGc(void*)+8)
  #14  pc 0x000000000006efbc  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+204)
  #15  pc 0x0000000000060d60  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

This is not a new issue (we've seen reports as far back as a year ago) but there has been a significant increase in crash reports.

Our app's sentry config has user interaction instrumentation enabled:

SentryAndroid.init(context) { options ->
    // [...]
    options.tracesSampleRate = 1.0
    options.profilesSampleRate = 1.0
    // [...]
    options.isEnableUserInteractionTracing = true
    // [...]
}

After some investigation, we've been able to replicate the issue in the debug version of our app (i.e., R8 is disabled) on Pixel 6a and Pixel 7a devices with Android 14 by:

  1. opening the app
  2. tap on any of our bottom navigation bar navigation item in very rapid succession until the app crashes

Based on Sentry/Android vitals crash reports this definitely occurs on a wide variety of devices with standard app usage, but this is one way we've been able to replicate the issue somewhat consistently.

Expected Result

The application proceeds as normal and doesn't crash.

Actual Result

After a while, the interactions slow down a bit, then the application crashes:

Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0x<sanitized>, nullptr=(null)) 

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 28018 >>> <Application ID redacted> <<<

backtrace:
  #00  pc 0x0000000000058290  /apex/com.android.runtime/lib64/bionic/libc.so (__strlen_aarch64+16)
  #01  pc 0x00000000005b510c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpState(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, art::Thread const*, int)+556)
  #02  pc 0x00000000005b487c  /apex/com.android.art/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, unwindstack::AndroidLocalUnwinder&, bool, bool) const+52)
  #03  pc 0x00000000005b6814  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+216)
  #04  pc 0x000000000054eeb0  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*, bool)+684)
  #05  pc 0x00000000005b6148  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+292)
  #06  pc 0x0000000000933e24  /apex/com.android.art/lib64/libart.so (art::AbortState::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+204)
  #07  pc 0x000000000093023c  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+712)
  #08  pc 0x00000000000160fc  /apex/com.android.art/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+80)
  #09  pc 0x00000000000156d0  /apex/com.android.art/lib64/libbase.so (android::base::LogMessage::~LogMessage()+516)
  #10  pc 0x00000000005b74ec  /apex/com.android.art/lib64/libart.so (art::Thread::~Thread()+1512)
  #11  pc 0x000000000030b2b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+708)
  #12  pc 0x000000000063eec8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+2208)
  #13  pc 0x000000000063e618  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallbackWithUffdGc(void*)+8)
  #14  pc 0x000000000006efbc  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+204)
  #15  pc 0x0000000000060d60  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

Attached is a full crash dump: tombstone.txt.

The issue cannot be replicated when user interaction instrumentation is disabled:

SentryAndroid.init(context) { options ->
    // [...]
    options.isEnableUserInteractionTracing = false
    // [...]
}

OlivierGenez avatar Aug 23 '24 13:08 OlivierGenez

I am facing the same issue, any update on this?

Sentry version:

io.sentry.android.gradle:4.5.1
io.sentry:sentry-android: 6.19.0
SentryAndroid.init(app) { options: SentryAndroidOptions ->
   options.dsn = token
    options.environment = buildType
    options.release = releaseName
}

here's the stack trace

Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0x<sanitized>, nullptr=(null)) 

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 17593 >>> ch.pickebike <<<

backtrace:
  #00  pc 0x0000000000097390  /apex/com.android.runtime/lib64/bionic/libc.so (__strlen_aarch64+16)
  #01  pc 0x00000000005b510c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpState(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, art::Thread const*, int)+556)
  #02  pc 0x00000000005b487c  /apex/com.android.art/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, unwindstack::AndroidLocalUnwinder&, bool, bool) const+52)
  #03  pc 0x00000000005b6814  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+216)
  #04  pc 0x000000000054eeb0  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*, bool)+684)
  #05  pc 0x00000000005b6148  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+292)
  #06  pc 0x0000000000933e24  /apex/com.android.art/lib64/libart.so (art::AbortState::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+204)
  #07  pc 0x000000000093023c  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+712)
  #08  pc 0x00000000000160fc  /apex/com.android.art/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+80)
  #09  pc 0x00000000000156d0  /apex/com.android.art/lib64/libbase.so (android::base::LogMessage::~LogMessage()+516)
  #10  pc 0x00000000005b74ec  /apex/com.android.art/lib64/libart.so (art::Thread::~Thread()+1512)
  #11  pc 0x000000000030b2b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+708)
  #12  pc 0x000000000063eec8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+2208)
  #13  pc 0x000000000010ba80  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208)
  #14  pc 0x000000000009f690  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

ash-wtag avatar Aug 26 '24 06:08 ash-wtag

Hey everyone, thanks for reaching out!

This looks like another issue with Androids built-in profiler. Similar to https://github.com/getsentry/sentry-java/issues/2604 and https://github.com/getsentry/sentry-java/issues/3561

Disabling user interaction instrumentation just hides the real culprit, as user interaction instrumentation creates transactions which in turn creates profiles, which itself uses the built-in Android profiler.

Could you try to disable profiling instead?

SentryAndroid.init(context) { options ->
    options.profilesSampleRate = 0.0
}

On top of that: Is your app using any native (C/C++) code in combination with some custom threading?

markushi avatar Aug 27 '24 08:08 markushi

Could you try to disable profiling instead?

We actually had tried this when debugging the issue and found that it seemed to prevent crashes from happening as well. Would you advise disabling profiling instead of user interaction instrumentation?

On top of that: Is your app using any native (C/C++) code in combination with some custom threading?

Our app doesn't use native code "directly", but some libraries we depend on do. The code is not open source though and is not shared with us, so I can't tell exactly how it deals with threading.

OlivierGenez avatar Aug 27 '24 10:08 OlivierGenez

For reference:

  • Google Issue tracker link for this issue https://issuetracker.google.com/issues/361129298
  • Google Issue tracker link for GH-3561 https://issuetracker.google.com/issues/362293861

kahest avatar Aug 27 '24 16:08 kahest

Could you try to disable profiling instead?

[...]Would you advise disabling profiling instead of user interaction instrumentation?

@OlivierGenez Yes, we would advise disabling profiling in the meantime instead.

markushi avatar Aug 28 '24 13:08 markushi

Let's try to reproduce this issue in a minimal environment (Android 14, as seen in the attached tombstone).

markushi avatar Aug 28 '24 13:08 markushi

Update from Google on the issue tracker:

We have shared this with our product and engineering team and will update this issue with more information as it becomes available.

kahest avatar Sep 02 '24 14:09 kahest

@OlivierGenez

My team has observed an increase in this type of crashes in Sentry/Android vitals with the latest update of our app

Is there any configuration change you did in the "latest update" of your app? E.g. did you change the sampling rate, enable a specific feature, bumped an SDK version tc?

markushi avatar Sep 04 '24 12:09 markushi

Hi @markushi , just a heads up that this will occur even with options.profilesSampleRate = 0.0. It's also happening on Android 12, 13 and 14.

empowerDan avatar Sep 12 '24 22:09 empowerDan

I can confirm with the latest update to disable profiling, we are still observing crashes. As @OlivierGenez mentioned, turning off profiling and disabling isEnableUserInteractionTracing reduced events of crashes resulting from aggressive monkey-taps, but lifecycle events seem to be the last listed event in some of the breadcrumbs in crashes. We have now disabled all tracing and are waiting to see if that helps at all.

options.isEnableActivityLifecycleTracingAutoFinish = false
options.isEnableAutoActivityLifecycleTracing = false
options.isEnableTimeToFullDisplayTracing = false
options.isEnableUserInteractionTracing = false

ashwin-coles avatar Sep 12 '24 22:09 ashwin-coles

@empowerDan @ashwin-coles could you share the backtrace of these crashes (after disabling profiling)? Is it the same as the other ones in this thread?

romtsn avatar Sep 13 '24 08:09 romtsn

Yep, same - also can confirm that @ashwin-coles snippet brings all art::Thread::DumpState errors down to 0, however this silences quite a lot of other things too so it's not a very viable long term solution as a paying customer.

options.isEnableActivityLifecycleTracingAutoFinish = false options.isEnableAutoActivityLifecycleTracing = false options.isEnableTimeToFullDisplayTracing = false options.isEnableUserInteractionTracing = false

Do we know if the issue occurs on previous versions of Sentry too?

empowerDan avatar Sep 13 '24 19:09 empowerDan

Hey guys, we've seen the same crash happening very often in our prod apps that also uses Sentry, we don't use instrumentation. The specific crash we have is this:

Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0x<sanitized>, nullptr=(null))

and the backtrace is this:

#00  pc 0x0000000000085ed0  /apex/com.android.runtime/lib64/bionic/libc.so (__strlen_aarch64+16)
  #01  pc 0x00000000005b510c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpState(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, art::Thread const*, int)+556)
  #02  pc 0x00000000005b487c  /apex/com.android.art/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, unwindstack::AndroidLocalUnwinder&, bool, bool) const+52)
  #03  pc 0x00000000005b6814  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+216)
  #04  pc 0x000000000054eeb0  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*, bool)+684)
  #05  pc 0x00000000005b6148  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+292)
  #06  pc 0x0000000000933e24  /apex/com.android.art/lib64/libart.so (art::AbortState::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+204)
  #07  pc 0x000000000093023c  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+712)
  #08  pc 0x00000000000160fc  /apex/com.android.art/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+80)
  #09  pc 0x00000000000156d0  /apex/com.android.art/lib64/libbase.so (android::base::LogMessage::~LogMessage()+516)
  #10  pc 0x00000000005b74ec  /apex/com.android.art/lib64/libart.so (art::Thread::~Thread()+1512)
  #11  pc 0x000000000030b2b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+708)
  #12  pc 0x000000000063eec8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+2208)
  #13  pc 0x00000000000fc230  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208)
  #14  pc 0x000000000008e310  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

Hope we can hear of a solution soon 👍

markdrake-dev avatar Sep 16 '24 16:09 markdrake-dev

@markdrake-dev thank you for the report and the backtrace. Does "we don't use instrumentation" mean you have the options below all deactivated? Do you have an options.profilesSampleRate set?

options.isEnableActivityLifecycleTracingAutoFinish = false
options.isEnableAutoActivityLifecycleTracing = false
options.isEnableTimeToFullDisplayTracing = false
options.isEnableUserInteractionTracing = false

kahest avatar Sep 16 '24 16:09 kahest

@kahest Sorry for not checking those values first. I found out we were configuring it via manifest and here are the values we are using:

  <!-- enable automatic breadcrumbs for user interactions (clicks, swipes, scrolls) -->
        <meta-data android:name="io.sentry.traces.user-interaction.enable" android:value="true" />
        <!-- disable screenshot for crashes (could contain sensitive/PII data) -->
        <meta-data android:name="io.sentry.attach-screenshot" android:value="false" />
        <!-- enable view hierarchy for crashes -->
        <meta-data android:name="io.sentry.attach-view-hierarchy" android:value="true" />

        <!-- enable the performance API by setting a sample-rate, adjust in production env -->
        <meta-data android:name="io.sentry.traces.sample-rate" android:value="1.0" />
        <!-- enable profiling when starting transactions, adjust in production env -->
        <meta-data android:name="io.sentry.traces.profiling.sample-rate" android:value="1.0" />

markdrake-dev avatar Sep 16 '24 16:09 markdrake-dev

We also experience the same crash which went under our radar because, for some reason, Firebase Crashlytics didn't catch this crash until we got an email from Google that our crash rate exceeds the device bad behavior threshold of 8.0% on 8 device models affecting 8.46% of installs. Around 2K of our users are affected by this crash and many of them are paid users.

This is pretty bad guys, I hope this will be solved soon, meantime we will have to de-integrate Sentry from our Android project.

Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0x<sanitized>, nullptr=(null))

AdnanYupi avatar Sep 16 '24 23:09 AdnanYupi

@AdnanYupi do you have any more information on if there's specific devices, OS versions, etc. affected?

kahest avatar Sep 17 '24 08:09 kahest

@kahest Hey, thanks for reaching out. Yeah, I can share the list of affected devices. The majority, around 90%, are Pixel devices and one Samsung device. Pixels are from Pixel 6 to Pixel 8 Pro. For now that Samsung device is kinda irrelevant because we don't have many users using that specific device. Here is the screenshot from the Play Console: Image

Judging by the rate percentage most of the time Android 13 was affected but, the case might be that these devices are mostly on Android 13. Not sure.

AdnanYupi avatar Sep 17 '24 08:09 AdnanYupi

We also experience the same crash which went under our radar because, for some reason, Firebase Crashlytics didn't catch this crash until we got an email from Google that our crash rate exceeds the device bad behavior threshold of 8.0% on 8 device models affecting 8.46% of installs. Around 2K of our users are affected by this crash and many of them are paid users.

This is pretty bad guys, I hope this will be solved soon, meantime we will have to de-integrate Sentry from our Android project.

Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0x<sanitized>, nullptr=(null))

Same issue with us. 2 primary crashes -

  1. [libc.so] abort
invalid pthread_t 0x<sanitized> passed to pthread_getcpuclockid
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 11317 >>> com.myapp <<<

backtrace:
  #00  pc 0x000000000008d394  /apex/com.android.runtime/lib64/bionic/libc.so (abort+168)
  #01  pc 0x00000000000f5870  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_internal_find(long, char const*)+200)
  #02  pc 0x00000000000f5788  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_internal_gettid(long, char const*)+12)
  #03  pc 0x00000000000f5548  /apex/com.android.runtime/lib64/bionic/libc.so (pthread_getcpuclockid+28)
  #04  pc 0x000000000079e518  /apex/com.android.art/lib64/libart.so (art::Trace::CompareAndUpdateStackTrace(art::Thread*, std::__1::vector<art::ArtMethod*, std::__1::allocator<art::ArtMethod*> >*)+120)
  #05  pc 0x000000000079ec64  /apex/com.android.art/lib64/libart.so (art::Trace::RunSamplingThread(void*)+756)
  #06  pc 0x00000000000f5298  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208)
  #07  pc 0x000000000008ebdc  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)
  1. [libc.so] __strlen_aarch64
Thread
Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0x<sanitized>, nullptr=(null)) 
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 11990 >>> com.myapp <<<

backtrace:
  #00  pc 0x0000000000096850  /apex/com.android.runtime/lib64/bionic/libc.so (__strlen_aarch64+16)
  #01  pc 0x00000000005b510c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpState(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, art::Thread const*, int)+556)
  #02  pc 0x00000000005b487c  /apex/com.android.art/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, unwindstack::AndroidLocalUnwinder&, bool, bool) const+52)
  #03  pc 0x00000000005b6814  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+216)
  #04  pc 0x000000000054eeb0  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*, bool)+684)
  #05  pc 0x00000000005b6148  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+292)
  #06  pc 0x0000000000933e24  /apex/com.android.art/lib64/libart.so (art::AbortState::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+204)
  #07  pc 0x000000000093023c  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+712)
  #08  pc 0x00000000000160fc  /apex/com.android.art/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+80)
  #09  pc 0x00000000000156d0  /apex/com.android.art/lib64/libbase.so (android::base::LogMessage::~LogMessage()+516)
  #10  pc 0x00000000005b74ec  /apex/com.android.art/lib64/libart.so (art::Thread::~Thread()+1512)
  #11  pc 0x000000000030b2b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+708)
  #12  pc 0x000000000063eec8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+2208)
  #13  pc 0x0000000000104fc4  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208)
  #14  pc 0x000000000009e764  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)

We had to roll back Sentry Integration. Our app is a mixture of Android + React Native.

Initialization was done like this -

// Android's Application Class
SentryAndroid.init(this)

// React Native's App.tsx
Sentry.wrap(App);
Sentry.init({
    dsn: SENTRY_DSN,
    release: version,
    tracesSampleRate: 1.0,
    environment: environment,
});

I'm willing to discuss this further over a call. Pinged @kahest over twitter DM.

aakashchoubey avatar Sep 17 '24 09:09 aakashchoubey

Our hunch is that there's no user impact, as there were no escalations on two of our apps. Likely crash is when user is returning to the app from background - app launches just from the starting screen.

Android SDKs affected - Android 14 (SDK 34) Android 13 (SDK 33) Android 12 (SDK 31) Android 12L (SDK 32)

Device Distribution We're seeing more OnePlus devices (25.7% for strlen crash and 18.4% for abort crash) Samsung R8Q has 11.8% contribution in abort crash But since numbers in specific devices are low, it could be just how some devices are more popular in the regions we serve (India, UAE, SGP)

Issue was not caught by Firebase but Sentry was able to catch it. Most likely we need Firebase NDK for firebase to be able to catch it. Disabling Sentry's init behind a flag helped bring a flatline - so we are sure it's Sentry SDK.

aakashchoubey avatar Sep 17 '24 09:09 aakashchoubey

@aakashchoubey thanks for the report. Please note that the first crash in your previous post is already tracked in a separate issue: https://github.com/getsentry/sentry-java/issues/2604

Also based on the init snippets you shared, both User Interaction Tracing and Profiling are disabled in your app - can you double-check this please? About "Disabling Sentry's init behind a flag helped bring a flatline" - where do you see this flatline?

kahest avatar Sep 17 '24 12:09 kahest

Hey @kahest, thanks for the reply. I'm aware and have been tracking #2604. Would you recommend that I update this there as well, or start a new issue altoge#2604

both User Interaction Tracing and Profiling are disabled in your app I think I missed the manifest part, adding here -

<meta-data android:name="io.sentry.auto-init" android:value="false" />
<!-- Required: set your sentry.io project identifier (DSN) -->
<meta-data
    android:name="io.sentry.dsn"
    android:value="myDSN"/>

<!-- enable automatic breadcrumbs for user interactions (clicks, swipes, scrolls) -->
<meta-data
    android:name="io.sentry.traces.user-interaction.enable"
    android:value="true" />

<!-- enable view hierarchy for crashes -->
<meta-data
    android:name="io.sentry.attach-view-hierarchy"
    android:value="true" />

<!-- enable the performance API by setting a sample-rate, adjust in production env -->
<meta-data
    android:name="io.sentry.traces.sample-rate"
    android:value="1.0" />
<!-- enable profiling when starting transactions, adjust in production env -->
<meta-data
    android:name="io.sentry.traces.profiling.sample-rate"
    android:value="1.0" />
<!-- enable app start profiling -->
<meta-data
    android:name="io.sentry.traces.profiling.enable-app-start"
    android:value="true" />

So as we can see, the flag is on natively.

Disabling Sentry's init behind a flag helped bring a flatline

Sure, adding a screenshot. Image So this is the screenshot for [libc.so] abort issue. We had disabled a flag from backend, which would stop the init method call. This brought a flatline in the events, since the SDK had stopped initializing. Timeline - Sentry was rolled out on 29th July, flag disabled on 9th August and enabled back on 31st August.

Few additional observations that I have - The crash [libc.so] __strlen_aarch64 did not happen for us in initial rollout. When we enabled the flag again, after the release on 31st August, we had updated the init method -

// before 31st August
Sentry.init({
	dsn: SENTRY_DSN,
	release: version,
	tracesSampleRate: 1.0,
});

// after 31st August
Sentry.init({
    dsn: SENTRY_DSN,
    release: version,
    tracesSampleRate: 1.0,
    environment: environment,  // where environment = 'production'
});

So, is it possible that adding the environment flag triggered this? Or maybe it could just be that the numbers were low.

There are two different stacktraces for this crash.

// first stacktrace
  #11  pc 0x000000000030b2b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+708)
  #12  pc 0x000000000063eec8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+2208)
  #13  pc 0x0000000000104fc4  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208)

// second stacktrace
  #11  pc 0x000000000030b2b4  /apex/com.android.art/lib64/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+708)
  #12  pc 0x000000000063eec8  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+2208)
>>  #13  pc 0x000000000063e618  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallbackWithUffdGc(void*)+8)
  #14  pc 0x0000000000104fe4  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208)

Also there's another similar crash [libc.so] strlen_a15 - here's the stacktrace

Thread
Check failed: tlsPtr_.method_trace_buffer == nullptr (tlsPtr_.method_trace_buffer=0xb1f7f6c0, nullptr=(null)) 
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 32007 >>> com.myapp <<<

backtrace:
  #00  pc 0x000000000005f62c  /apex/com.android.runtime/lib/bionic/libc.so (strlen_a15+72)
  #01  pc 0x0000000000520b47  /apex/com.android.art/lib/libart.so (art::Thread::DumpState(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, art::Thread const*, int)+2310)
  #02  pc 0x0000000000535a6f  /apex/com.android.art/lib/libart.so (art::DumpCheckpoint::Run(art::Thread*)+646)
  #03  pc 0x0000000000530f29  /apex/com.android.art/lib/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*, bool)+560)
  #04  pc 0x000000000053030b  /apex/com.android.art/lib/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+1022)
  #05  pc 0x00000000004f742d  /apex/com.android.art/lib/libart.so (art::AbortState::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+188)
  #06  pc 0x00000000004e59dd  /apex/com.android.art/lib/libart.so (art::Runtime::Abort(char const*)+1316)
  #07  pc 0x000000000000e0f1  /apex/com.android.art/lib/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+48)
  #08  pc 0x000000000000d965  /apex/com.android.art/lib/libbase.so (android::base::LogMessage::~LogMessage()+332)
  #09  pc 0x0000000000524521  /apex/com.android.art/lib/libart.so (art::Thread::~Thread()+1368)
  #10  pc 0x0000000000534eb1  /apex/com.android.art/lib/libart.so (art::ThreadList::Unregister(art::Thread*, bool)+596)
  #11  pc 0x0000000000518001  /apex/com.android.art/lib/libart.so (art::Thread::CreateCallback(void*)+1784)
  #12  pc 0x00000000000ad143  /apex/com.android.runtime/lib/bionic/libc.so (__pthread_start(void*)+40)
  #13  pc 0x00000000000642dd  /apex/com.android.runtime/lib/bionic/libc.so (__start_thread+30)

Hope this helps. Let me know if you need any further info on these.

aakashchoubey avatar Sep 17 '24 12:09 aakashchoubey

@aakashchoubey thanks for the details - let me answer one-by-one.

I'm aware and have been tracking https://github.com/getsentry/sentry-java/issues/2604. Would you recommend that I update this there as well, or start a new issue altoge#2604

#2604 looks similar in many respects, but is a different root cause most likely, so we're trying not to conflate the two. If you have new info for the crash with pthread_getcpuclockid in the backtrace, please add it to #2604. No need to create a new issue.

So, is it possible that adding the environment flag triggered this? Or maybe it could just be that the numbers were low.

Environment should not affect this in any way, we can rule this out.

There are two different stacktraces for this crash.

These are almost identical, a difference in the 13th/14th frame is most likely not relevant, but thanks for pointing it out 👍

kahest avatar Sep 17 '24 15:09 kahest

@kahest can you please share plan for release this bug.

sagarbhojaviya avatar Sep 20 '24 07:09 sagarbhojaviya

@sagarbhojaviya please see the updates at the top. There is currently no way for us to fix this on the SDK side, it will most likely require a fix inside of the Android Tracer. We will keep this issue updated.

kahest avatar Sep 20 '24 08:09 kahest

@kahest can you please help me in this

i have below meta-data inside android manifest file, i want to know which tag i want to new add or update current meta-data value for stop this issue temporary.

    <meta-data
        android:name="io.sentry.traces.user-interaction.enable"
        android:value="true" /> <!-- enable screenshot for crashes -->

    <meta-data
        android:name="io.sentry.attach-screenshot"
        android:value="true" /> <!-- enable view hierarchy for crashes -->

    <meta-data
        android:name="io.sentry.attach-view-hierarchy"
        android:value="true" /> <!-- enable the performance API by setting a sample-rate, adjust in production env -->

    <meta-data
        android:name="io.sentry.traces.sample-rate"
        android:value="1.0" /> <!-- enable profiling when starting transactions, adjust in production env -->

    <meta-data
        android:name="io.sentry.traces.profiling.sample-rate"
        android:value="1.0" />

    <meta-data
        android:name="io.sentry.anr.timeout-interval-mills"
        android:value="5000" /> <!-- Required: set your sentry.io project identifier (DSN) -->

    <meta-data
        android:name="io.sentry.dsn"
        android:value="${dsnValueSentry}" />

sagarbhojaviya avatar Sep 23 '24 12:09 sagarbhojaviya

@sagarbhojaviya it's this:

  <meta-data
        android:name="io.sentry.traces.profiling.sample-rate"
        android:value="0.0" />

romtsn avatar Sep 23 '24 12:09 romtsn

Closing this as it has been fixed by Google. Follow https://issuetracker.google.com/issues/361129298 for updates on rollout status. Latest information is that the fix has been part of the Sept 2024 mainline updates.

kahest avatar Dec 04 '24 14:12 kahest