sentry-java icon indicating copy to clipboard operation
sentry-java copied to clipboard

Always log dropped events, regardless of debug setting

Open gpind opened this issue 2 years ago • 14 comments

Problem Statement

We rely on Sentry to notify us when things go wrong, so when the SDK silently drops events (for example due to size limits) it's very problematic. Tracking down dropped events is much harder than it should be, particularly on a prod environment where debug mode would be prohibitively verbose.

Solution Brainstorm

The SDK should always log when an event is dropped, even outside of debug mode. The message should be terse but easily identifiable, and ideally include both the reason why the event was dropped (e.g. too large) and the event ID.

gpind avatar Jun 14 '23 22:06 gpind

Hello @gpind. Thanks for the request. We'll discuss internally and update here.

adinauer avatar Jun 15 '23 09:06 adinauer

@gpind could you please elaborate on your use case.

  • Would you expect the JSON of whatever was dropped to be logged or just a message that something was dropped?
  • Are you only interested in events dropped due to size limit and network errors or also due to rate limiting?
  • What about events dropped due to sampling?

Ideas so far:

  • a callback we invoke for everything we drop including a reason
  • write envelope to disk and log path (would probably only recommend in debug mode, not on production)
  • log dropped events on higher level even if debug is false (also wouldn't recommend on production)

adinauer avatar Jun 15 '23 13:06 adinauer

Would you expect the JSON of whatever was dropped to be logged or just a message that something was dropped?

No, full JSON would be too much (especially where events are dropped due to size limit). It should be a short message with just enough info to identify the event. Event ID seems perfect, since we can log that after calling captureEvent along with whatever tags are relevant to us.

Are you only interested in events dropped due to size limit and network errors or also due to rate limiting? What about events dropped due to sampling?

Ideally all dropped events.

a callback we invoke for everything we drop including a reason

Could work well, as long as the callback gets the full SentryEvent.

write envelope to disk and log path (would probably only recommend in debug mode, not on production)

This wouldn't work well for us, because we want something we can leave enabled 24/7 on prod.

Edit: on second thought, it might be fine for us since we don't drop many events under normal circumstances.

log dropped events on higher level even if debug is false (also wouldn't recommend on production)

I can't speak for others, but we definitely don't send so many events that logging one line per dropped event would be problematic for us.

gpind avatar Jun 15 '23 17:06 gpind

@gpind sounds like it would be sufficient to allow you to easily set the diagnostic level to ERROR or WARNING (which you can already do via options.setDiagnosticLevel) and keep debug turned on. We'd just have to improve logging to include event IDs.

Would you agree that'd solve it?

adinauer avatar Jun 16 '23 07:06 adinauer

@adinauer How verbose is ERROR level? I can't easily test it out because the wrapper we use (sentry-clj) doesn't currently expose this setting. So I'll need to submit a PR to add that support, but ideally only once I'm reasonably confident that it'll work for us. That will depend on whether ERROR is too verbose to leave enabled 24/7.

gpind avatar Jun 20 '23 18:06 gpind

@gpind unfortunately we can't guarantee low verbosity as it depends largely on your application and usage of the SDK. There may for example be many incoming requests with malformed headers or failed outgoing requests to Sentry (e.g. too large, rate limited, ...) and many other things causing errors.

So maybe adding the callback would be the better option here.

Or are you more interested in events that we should know beforehand are too large to send and then dump somewhere you can specify.

adinauer avatar Jun 21 '23 13:06 adinauer

Understood, but I'd still like to try out error-level logging to see whether it would work for us. I'll try submitting a PR to sentry-clj. In the meanwhile, could you clarify something?

@gpind sounds like it would be sufficient to allow you to easily set the diagnostic level to ERROR or WARNING (which you can already do via options.setDiagnosticLevel) and keep debug turned on. We'd just have to improve logging to include event IDs.

It sounds like you're saying that the "dropped event" message will still get printed at ERROR or WARNING levels. But it seems like that message is actually a DEBUG— is that correct?

gpind avatar Jun 21 '23 19:06 gpind

@gpind beforeSend is something you control. So if an event is dropped there, it should be your code returning null from the callback. See https://docs.sentry.io/platforms/java/configuration/filtering/ for more details.

adinauer avatar Jun 22 '23 06:06 adinauer

We don't use beforeSend— that's just the message Sentry support told us to look for. What's the actual message for an event that's dropped due to size limits?

gpind avatar Jun 22 '23 17:06 gpind

I've used a modified version of our console sample to test and it shows the following log for a single event that was too large when diagnostic level is set to error using options.setDiagnosticLevel(SentryLevel.ERROR);:

ERROR: Request failed, API returned 400
ERROR: {"detail":"envelope exceeded size limits (https://develop.sentry.dev/sdk/envelopes/#size-limits)"}
ERROR: The transport failed to send the envelope with response code 400
ERROR: Envelope submission failed 
 java.lang.IllegalStateException: The transport failed to send the envelope with response code 400
java.lang.IllegalStateException: The transport failed to send the envelope with response code 400
	at io.sentry.transport.AsyncHttpTransport$EnvelopeSender.flush(AsyncHttpTransport.java:269)
	at io.sentry.transport.AsyncHttpTransport$EnvelopeSender.run(AsyncHttpTransport.java:201)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

adinauer avatar Jun 23 '23 07:06 adinauer

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

getsantry[bot] avatar Jul 21 '23 07:07 getsantry[bot]

@adinauer thanks for the info. Enabling error-level logging 24/7 seems workable for us so far, so I think event IDs are all we'd need after all.

gpind avatar Jul 22 '23 00:07 gpind

We use Sentry for identifying slow APIs for performance optimization. The slower APIs tend to have a larger number of DB queries which leads to the associated Sentry event being dropped for exceeding the envelope limit. Because of this we can no longer rely on Sentry to identify the slow APIs as they are never recorded on Sentry.

When the size is limit can Sentry create an event truncated to the limit or at the minimum record the top level span so that we can at least identify the slow APIs.

ksmalik1 avatar Oct 11 '23 03:10 ksmalik1

Thanks for the feedback @ksmalik1 I've opened https://github.com/getsentry/sentry-java/issues/2980 so we can track this separately. We'll discuss internally and update there.

adinauer avatar Oct 11 '23 04:10 adinauer