sentry-java icon indicating copy to clipboard operation
sentry-java copied to clipboard

feat: minimal tombstone integration

Open supervacuus opened this issue 3 months ago • 2 comments

:scroll: Description

As discussed last week with @markushi, the process will be to make minimal atomic changes in each PR and merge directly to main rather than accumulating on an uber feature branch. This allows for easier review/feedback/corrections, and we can already test a subset of the entire feature "in the field".

The first minimal change is a basic tombstone integration:

  • exposes internal option to enable/disable (disabled by default)
  • the integration will only ever be enabled if the runtime system is at least Android 12
  • the current setup suggests an operation where users will have to disable NDK support (or only depend on sentry-android-core) or get two reports for the same crash
  • only retrieves the most current tombstone (the current implementation is not entirely correct: we should either remove any remaining ApplicationExitInfo entries with REASON_CRASH_NATIVE or report them too, including the option for the latter; I left this out for minimal interface exposure in the first step, but either variant is easy to add to this PR or later)
  • adds a basic tombstone parser/decoder and accompanying snapshot test
  • introduces the following dependencies:
    • runtime: protobuf-javalite (the entire features adds ca. 75KiB to the Android sample release APK)
    • build: the protobuf plugin and the protoc compiler to automate protocol updates

Open Issues:

  • ~While the protobuf runtime is relatively small, there is still the possibility of conflicting with client-side protobuf versions (major versions often introduce quite severe breakage, but I haven't tested this yet, only reviewed change logs).~
  • ~finding clarity in how to proceed with older tombstones (ignore or handling similarly to ANRv2)~
  • ~decision if this minimal setup already makes sense to release as an internal API~
  • add ManifestMetadataReader to configure conveniently? (or not since the corresponding options are only internal?)
  • No changes yet to the UI (there are quite a few aspects that would make these stack-traces more readable, from my pov this is currently out of scope, but i am writing stuff down)

:bulb: Motivation and Context

First sensible release step for https://github.com/getsentry/sentry-java/issues/3295

Part of https://linear.app/getsentry/project/tombstone-support-android-0024cb6e3ffd/

:green_heart: How did you test it?

  • Added a basic parser snapshot test for a serialized tombstone protobuf.
  • Manual testing.

:pencil: Checklist

  • [x] I added GH Issue ID & Linear ID
  • [x] I added tests to verify the changes.
  • [ ] No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • [ ] I updated the docs if needed.
  • [ ] I updated the wizard if needed.
  • [ ] Review from the native team if needed.
  • [ ] No breaking change or entry added to the changelog.
  • [ ] No breaking change for hybrid SDKs or communicated to hybrid SDKs.

:crystal_ball: Next steps

  • Decide if we want to go forward with client-side tombstone processing. This decision should happen with this PR.
    • If no, consider sending a tombstone as an attachment, similar to how we attach minidumps and let ingestion/processing deal with decoding (I can prototype this in parallel, since I think we can do most of it in relay, as a first step)
    • If yes, decide on adding the protobuf runtime library as a dependency to sentry-android-core (or shade/relocate, or implement our own decoder, given that this is a stable format which only requires a subset of protobuf).
      • Depending on feedback, adaptation to this minimal change (primarily handling older tombstones).
      • Otherwise, integrate an EventProcessor that merges crash events from sentry-native with tombstones.

#skip-changelog (for now)

supervacuus avatar Nov 25 '25 09:11 supervacuus

@markushi, I just realized I cannot omit having a separate "tombstone" marker, even if I report all events, without repeating them. I mean, this was clear to me, but, in addition to it being a must for this PR already, unlike the ANR marker, I must also align it with crashedLastRun.

I wonder if it would make the most sense to do it similarly to ANR by using a TombstoneHint to ensure writing the marker timestamp at the correct life-cycle stage of the event, but introduce a new interface analog to AbnormalExit called CrashExit, so that the hint can take a different route (since it isn't truly an abnormal exit and should affect similar paths to the Native SDK crash marker).

The biggest issue with that approach is that the crashedLastRun is handled entirely in EnvelopeCache rather than AndroidEnvelopeCache, whereas ApplicationExitInfo hint/marker handling happens in the latter.

The PR still makes sense for a first review from you (since, if the general direction makes sense and you have todos not in my list, I can also add a test for the integration itself). Still, I would appreciate a short sync on how to align these execution paths (maybe I don't need to align tombstones with crashedLastRun and can abuse AbnormalExit for the same outcome, though it feels like a significant shortcut, even for minimal internal exposure, which is not at all what I aimed for).

I can convert the PR back to a draft if you prefer.

supervacuus avatar Nov 26 '25 21:11 supervacuus

@markushi and @romtsn, I think I also found an okay solution to reduce duplication between the tests for the two AEI integrations. That means the first three "milestones" required for a release (without Native SDK event enrichment/merging) are finished. Please let me know if something is missing from your POV or not yet up to par, and how you would like to proceed.

Btw, multiple TODO entries in the PR don't necessarily highlight a vital change still open, but do signal a decision we probably should have made in this PR. Please understand them as review guidance for open questions/decisions I still have.

supervacuus avatar Dec 09 '25 15:12 supervacuus