trace in elasticsearch bulk indexing app dropped because payload too big
We're using APM to trace an application which does writes to our elasticsearch cluster. Y'all added support for the elasticsearch7 lib (thank you!!!!) and when I upgraded to grab that support, I noticed that a ton of our traces started getting dropped because the payload was too big.
Lots of errors that look like this:
trace (8230299b) larger than payload limit (8000000b), dropping, 1 additional messages skipped
For some additional context:
- Each trace is covering our efforts to index a batch of documents.
- This batch gets broken up further into sub-batches, and we make a separate write to elasticsearch's bulk API for each sub-batch (batches within batches!)
- As a reference point, I found a trace that had 8 writes to elasticsearch
- The span for the write to elasticsearch's bulk API includes a truncated snippet of the data payload (i.e. the documents being written).
- It's truncated, but it's still pretty long
My hunch is that some of the traces that are too big have a bunch of bulk writes in them, and the total combined size of all those truncated data payloads is what pushes the trace over the edge.
In any case, I'm looking for guidance. Is there anything we can do with configuration to shrink the payload down? Like is it possible to disable capturing the payload on calls to elasticsearch's bulk API? Or is there a bug/feature request in here? 😅
Thanks so much in advance for the support. Y'all are the best!
Which version of dd-trace-py are you using?
0.55.0
Which version of the libraries are you using?
elasticsearch7==7.13.4
How can we reproduce your problem?
- Trace a block of code that makes multiple large writes to elasticsearch's bulk API
What is the result that you get?
We get an error like
trace (8230299b) larger than payload limit (8000000b), dropping, 1 additional messages skipped
What is the result that you expected?
Ideally we don't drop these traces.
wanted to bump this to see if there's any options here 🤔
I can across this error when using Datadog in my own application.
Here is my first attempt of workaround:
import os
# Work around of the following warning/error:
#
# trace (14_533_373b) larger than payload limit (8000000b), dropping
#
# by setting tracer write buffer to 128M
#
# See ddtrace/internal/writer.py(59)get_writer_buffer_size()
#
# Note that ddtrace evalutes this during the import time,
# so make sure you put this code in main.py before importing Datadog.
#
dd_trace_buffer_val = str(128_000_000)
os.environ["DD_TRACE_WRITER_BUFFER_SIZE_BYTES"] = dd_trace_buffer_val
os.environ["DD_TRACE_WRITER_MAX_PAYLOAD_SIZE_BYTES"] = dd_trace_buffer_val
@miohtama's configuration variables are the same ones I'd recommend.
It's also possible that this was fixed in https://github.com/DataDog/dd-trace-py/pull/5375.
I'm going to close this out since the original request was configuration that made it possible to trace requests with large payloads. Please let me know if we need to reopen. Thanks all!