opentelemetry-lambda icon indicating copy to clipboard operation
opentelemetry-lambda copied to clipboard

Proposal: Include Batch processor

Open pmm-sumo opened this issue 4 years ago • 4 comments

Is your feature request related to a problem? Please describe.

Currently, no processors are included in the collector for Lambda. This makes it impossible to use Batch procesor. Missing it in the pipeline causes some issues, such as causing sending not-efficient batches by the exporter or not being able to limit maximum batch size

Describe the solution you'd like

Include Batch processor

pmm-sumo avatar May 10 '21 07:05 pmm-sumo

cc @alolita @mrumian-sumo

pmm-sumo avatar May 10 '21 07:05 pmm-sumo

AFAIK the batch processor does not provide any mechanism for a force flush, which means it cannot be used in lambda. The batch processor just waits until it has accumulated the required amount of data or the required period of time has passed, but if does not reach that condition before the lambda is frozen, the data could be sent minutes or hours later and possibly lost altogether.

To do batching, you use a BatchSpanProcessor then call flush when done. This is what the auto-instrumentation for lambda should be doing.

BatchSpanProcessor should allow you to configure batch sizes using environment variables such as OTEL_BSP_MAX_EXPORT_BATCH_SIZE.

However, if your lambda function only produces one span per-execution, you have to send one span.

I guess if you know the lambda will be run multiple times in quick succession you could take the risk and carry data across from one execution to the next, but that seems like a bad idea to me.

gregoryfranklin avatar May 10 '21 18:05 gregoryfranklin

AFAIK the batch processor does not provide any mechanism for a force flush, which means it cannot be used in lambda.

The design proposal mentions following:

In the long run, we hope Lambda provides an IDLE event callback, which should be around 60 seconds

I think that with such capability (which as I understand is not yet available?) and perhaps an update to the internal Processor API (or to Batch Processor) we could however leverage batch processor and flush data eventually?

Without it, isn't the role of collector in the layer currently limited to just data translation between formats?

pmm-sumo avatar May 10 '21 18:05 pmm-sumo

Removing processors from Lambda collector extension because we want to make sure force_flush() is blocked until backend returns response to lambda function, the sequence diagram is: force_flush(Lambda runtime, user's Lambda function) -> collector receiver -> collector exporter -> backend service

Adding batching processor means it breaks to 2 flows, may get problem described in design proposal

  1. force_flush(Lambda runtime, user's Lambda function) -> collector receiver -> batchProcessor (Lambda Freeze)
  2. batchProcessor -> collector exporter -> backend service

BatchProcessor improves efficiency most likely in collector mode but not in agent mode. In Lambda there is only one application, so alternatively we can batch/aggregate telemetry data from SDK side instead of Collector side.

wangzlei avatar Jun 04 '21 01:06 wangzlei