sentry-java icon indicating copy to clipboard operation
sentry-java copied to clipboard

AfterSendCallback hook to capture SDK Issues

Open nodshams opened this issue 1 year ago • 4 comments

Problem Statement

Hey team,

We experienced a problem with our Spring Boot application and egress proxy configuration. Due to a bug, the proxy settings were omitted in the shell entrypoint, causing our HTTP clients to fail. As a result, Sentry could not send events, and network issues went unnoticed until they had a production impact (the issue was identified when a Datadog monitor was triggered by a drop in business metrics).

Would it be possible to add a hook on application startup to healthcheck if Sentry has a successful server connection, or after the event send process to capture issues (in this case, network issues) that the Sentry SDK is experiencing? This would allow us to report these issues to Datadog and set up appropriate monitors.

Thanks!

Solution Brainstorm

No response

nodshams avatar Aug 23 '24 12:08 nodshams

Hey @nodshams thanks for opening this issue. We'll talk about this internally and come back.

adinauer avatar Aug 26 '24 10:08 adinauer

Can you please clarify how this should work @nodshams. If there's network issues, notifying other services also wouldn't work I presume.

adinauer avatar Aug 26 '24 11:08 adinauer

Hey @adinauer,

Thank you for getting back to me. What you mentioned makes complete sense for request serving. However, the issue we encountered was specifically related to the egress configuration for https.proxyHost and https.proxyPort; they were not set properly. This meant that we were blind to problems in scheduled tasks or background processes that threw exceptions. We only noticed the issue when Datadog scraped the metrics from the endpoint.

After fixing the egress configuration, we started receiving some errors. However, the problem now is that when Sentry is quiet for a long time, we can't distinguish whether it's due to network issues or if everything is running smoothly.

Thanks again!

nodshams avatar Aug 27 '24 13:08 nodshams

Thanks for the info @nodshams , our suggestion would be to have a hook in Sentry that allows you to detect failed Sentry requests. So a callback that'd be invoked whenever our transport fails to send an event. We're not quite sure how that'd be set. We'll update here once we take a more detailed looks. Can't state any ETA at the moment.

To perform a check at startup, you'd then have to capture a Sentry event (e.g. a message) and wait for the hook to fire (in case there's a problem).

Does this sound like it would solve your problem?

adinauer avatar Aug 27 '24 14:08 adinauer

👋 @adinauer and team — we're running into a similar issue regarding callbacks with the newer SDK.

We're currently upgrading our Java Sentry SDK from version 1.6 to 8.x. In our existing setup, we use a class that implements io.sentry.connection.EventSendCallback for tracking event sending failures and increment a Datadog metric.

So far, we haven’t found an equivalent mechanism in the newer SDK to hook into send failures. Are there any recommended alternatives or workarounds for this use case?

Thanks in advance!

nehamohan avatar Jun 26 '25 04:06 nehamohan

Hello @nehamohan, I'm afraid there's no direct replacement at the moment.

One workaround would be to implement your own ITransport (via ITransportFactory).

We're already tracking stats in client reports and could also add a sort of listener for it that would allow you to hook in there.

Here's a list of reasons we track in the SDK.

adinauer avatar Jun 27 '25 04:06 adinauer

Thanks for the quick response, @adinauer!

That sounds promising. To share a bit more context: we rely on Sentry to trigger pages, so if we’re unable to connect or send events to Sentry, we’d like to be alerted. In the older SDK, we used EventSendCallback to hook into send failures and increment a Datadog metric which we monitor.

I noticed a io.sentry.Sentry.isHealthy() method but I'm not sure if it'll cover all the failure cases. Open to any suggestions if there’s a best-practice approach you’d recommend for handling this in the newer SDK.

nehamohan avatar Jun 27 '25 21:06 nehamohan

isHealth is mostly for internal use at the moment, letting us know whether we should lower sample rate for transactions temporarily due to rate limit being active or our queue for sending out events rejecting. This is what our backpressure monitor does.

adinauer avatar Jun 30 '25 03:06 adinauer