connect icon indicating copy to clipboard operation
connect copied to clipboard

Improve handling of Bulk requests in ElasticSearch Output

Open nibbleshift opened this issue 3 years ago • 0 comments

In the current implementation, it's possible to create a Bulk request that exceeds http.max_content_length that is configured on the ElasticSearch instances. If the Bulk object exceeds this size an HTTP 413 error will be returned.

I propose we update the ElasticSearch output to avoid exceeding the max_content_limit and sending 1 or more Bulk requests as need.

There are at least 2 scenarios we need to cover:

  • In the case where Sniff is enabled on the output, the http.max_content_limit is exposed to the client because the client sniffed all of the available nodes. We can access this value for each node via NodesInfoNodeHTTP.MaxContentLengthInBytes. Since it's possible for ES instances to have different values, we should iterate all nodes and use the lowest value from the set of nodes.
  • In the case where Sniff is disabled on the output, we can expose a bulk_limit_bytes configuration option in the output plugin to allow users to tune this value. We can use 100mb as the default since that is the default configuration in ElasticSearch.

If Sniff is enabled, it will over-ride the bulk_limit_bytes value.
If Sniff is disabled, bulk_limit_bytes will be used, which will be the user-specified value or use the default of 100mb.

The current behavior is to use a single Bulk request when calling Write or WriteBatch. The proposed behavior would result in one or more Bulk Requests being sent in the Write/WriteBatch calls limited by the bulk_limit_bytes OR NodesInfoNodeHTTP.MaxContentLengthInBytes.

As the bulk request is being assembled Bulk.EstimateSizeInBytes() can be called to determine the size of the batch. If the size exceeds the limit, the bulk request will be sent and if necessary a new bulk request will be assembled.

nibbleshift avatar Sep 28 '22 13:09 nibbleshift