apm icon indicating copy to clipboard operation
apm copied to clipboard

Tail-based sampling

Open nehaduggal opened this issue 5 years ago • 2 comments

Summary of the problem (If there are multiple problems or use cases, prioritize them)

Head based sampling does not take into account the end to end workflows. It does not allow a user to specify tracing mechanisms for end to end workflows or even specific transactions within a service. Tail-based sampling would allow a user to configure end to end traces that are more importantly in a different way.

Personas / User Stories:

  • As a customer using APM, I have instrumented all my services and workflows with APM agents. I see a lot of distributed traces being generated for my services which makes the experience overwhelming to me. I am unable to easily spot the critical workflows by looking at this UI. I would like to tag some of my workflows as critical traces so I can filter out the UI view to view these critical traces only. The ability to do that will allow me to easily visualize these traces and identify any major issues with these critical workflows. Additionally, I may also want to sample these traces at a higher rate than the rest, so I can capture more traces for these critical workflows.

  • As an Ops person maintaining our Elastic infrastructure, I want to be able to set a default maximum sampling rate covering all non-important traces. This will enable me to calculate and control my Elastic storage costs.

Success criteria:

  • Ability to configure critical workflows as important traces via the UI or via an API call.
  • Fine tune the sampling rate for these traces.
  • Deploy the changes to my application without needing to restart my application.
  • Filter the UI to view these critical transactions. for example: as I user I want to quickly identify if one of my sites is down

List known (technical) restrictions and requirements For example: has to be scalable from 0-15k containers

nehaduggal avatar Dec 11 '20 00:12 nehaduggal

Pinging @elastic/observability-design (design)

elasticmachine avatar Dec 11 '20 00:12 elasticmachine

Thanks for the issue @nehaduggal. I have a few questions:

I would like to tag some of my workflows as critical traces

You mention tagging there. I wonder: is decided that we are going to use tags to solve this problem?

Fine tune the sampling rate for these traces I think we could split this into a separate issue. cc @formgeist

Also @nehaduggal do you mind if we move this issue to the design repository?

katrin-freihofner avatar Dec 14 '20 13:12 katrin-freihofner