Always log dropped events, regardless of debug setting
Problem Statement
We rely on Sentry to notify us when things go wrong, so when the SDK silently drops events (for example due to size limits) it's very problematic. Tracking down dropped events is much harder than it should be, particularly on a prod environment where debug mode would be prohibitively verbose.
Solution Brainstorm
The SDK should always log when an event is dropped, even outside of debug mode. The message should be terse but easily identifiable, and ideally include both the reason why the event was dropped (e.g. too large) and the event ID.
Hello @gpind. Thanks for the request. We'll discuss internally and update here.
@gpind could you please elaborate on your use case.
- Would you expect the JSON of whatever was dropped to be logged or just a message that something was dropped?
- Are you only interested in events dropped due to size limit and network errors or also due to rate limiting?
- What about events dropped due to sampling?
Ideas so far:
- a callback we invoke for everything we drop including a reason
- write envelope to disk and log path (would probably only recommend in debug mode, not on production)
- log dropped events on higher level even if debug is
false(also wouldn't recommend on production)
Would you expect the JSON of whatever was dropped to be logged or just a message that something was dropped?
No, full JSON would be too much (especially where events are dropped due to size limit). It should be a short message with just enough info to identify the event. Event ID seems perfect, since we can log that after calling captureEvent along with whatever tags are relevant to us.
Are you only interested in events dropped due to size limit and network errors or also due to rate limiting? What about events dropped due to sampling?
Ideally all dropped events.
a callback we invoke for everything we drop including a reason
Could work well, as long as the callback gets the full SentryEvent.
write envelope to disk and log path (would probably only recommend in debug mode, not on production)
This wouldn't work well for us, because we want something we can leave enabled 24/7 on prod.
Edit: on second thought, it might be fine for us since we don't drop many events under normal circumstances.
log dropped events on higher level even if debug is false (also wouldn't recommend on production)
I can't speak for others, but we definitely don't send so many events that logging one line per dropped event would be problematic for us.
@gpind sounds like it would be sufficient to allow you to easily set the diagnostic level to ERROR or WARNING (which you can already do via options.setDiagnosticLevel) and keep debug turned on. We'd just have to improve logging to include event IDs.
Would you agree that'd solve it?
@adinauer How verbose is ERROR level? I can't easily test it out because the wrapper we use (sentry-clj) doesn't currently expose this setting. So I'll need to submit a PR to add that support, but ideally only once I'm reasonably confident that it'll work for us. That will depend on whether ERROR is too verbose to leave enabled 24/7.
@gpind unfortunately we can't guarantee low verbosity as it depends largely on your application and usage of the SDK. There may for example be many incoming requests with malformed headers or failed outgoing requests to Sentry (e.g. too large, rate limited, ...) and many other things causing errors.
So maybe adding the callback would be the better option here.
Or are you more interested in events that we should know beforehand are too large to send and then dump somewhere you can specify.
Understood, but I'd still like to try out error-level logging to see whether it would work for us. I'll try submitting a PR to sentry-clj. In the meanwhile, could you clarify something?
@gpind sounds like it would be sufficient to allow you to easily set the diagnostic level to ERROR or WARNING (which you can already do via options.setDiagnosticLevel) and keep debug turned on. We'd just have to improve logging to include event IDs.
It sounds like you're saying that the "dropped event" message will still get printed at ERROR or WARNING levels. But it seems like that message is actually a DEBUG— is that correct?
@gpind beforeSend is something you control. So if an event is dropped there, it should be your code returning null from the callback. See https://docs.sentry.io/platforms/java/configuration/filtering/ for more details.
We don't use beforeSend— that's just the message Sentry support told us to look for. What's the actual message for an event that's dropped due to size limits?
I've used a modified version of our console sample to test and it shows the following log for a single event that was too large when diagnostic level is set to error using options.setDiagnosticLevel(SentryLevel.ERROR);:
ERROR: Request failed, API returned 400
ERROR: {"detail":"envelope exceeded size limits (https://develop.sentry.dev/sdk/envelopes/#size-limits)"}
ERROR: The transport failed to send the envelope with response code 400
ERROR: Envelope submission failed
java.lang.IllegalStateException: The transport failed to send the envelope with response code 400
java.lang.IllegalStateException: The transport failed to send the envelope with response code 400
at io.sentry.transport.AsyncHttpTransport$EnvelopeSender.flush(AsyncHttpTransport.java:269)
at io.sentry.transport.AsyncHttpTransport$EnvelopeSender.run(AsyncHttpTransport.java:201)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!
"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀
@adinauer thanks for the info. Enabling error-level logging 24/7 seems workable for us so far, so I think event IDs are all we'd need after all.
We use Sentry for identifying slow APIs for performance optimization. The slower APIs tend to have a larger number of DB queries which leads to the associated Sentry event being dropped for exceeding the envelope limit. Because of this we can no longer rely on Sentry to identify the slow APIs as they are never recorded on Sentry.
When the size is limit can Sentry create an event truncated to the limit or at the minimum record the top level span so that we can at least identify the slow APIs.
Thanks for the feedback @ksmalik1 I've opened https://github.com/getsentry/sentry-java/issues/2980 so we can track this separately. We'll discuss internally and update there.