opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

Serializable tracing context - spans

Open Gromcio opened this issue 5 years ago • 13 comments

What are you trying to achieve?

I'm trying to adapt open telemetry to the Apache NiFi cluster through their UI. By giving users access to processors (start, stop, addattribute, addevent), they will be able to create their custom data tracing around their defined workflows. The problem is that they define it all through UI which abstracts actual code executed that can be executed on any node (all of that code or its parts). They can make mistakes which in default implementations would leave memory leaks or simply would throw an error since there's no span context on that node memory or that thread. Because of that i wanted to serialize the span (or whole stack of spans) and move it together with flow files (actual data structures being processed between user defined processors) which by default get automatically garbage collected by nifi (so it solves the memory leaks problem and getting the span context stack).

What did you expect to see?

A way to manage spans with context broader than single thread

Additional context.

Currently we're creating our custom spans that we're serializing together with flowfile and when we want to close them we convert them to open telemetry spans and close them all together. We can't allow users to close only one span in the stack because we won't be able to set span id and traceid correctly. We could make it work if spans are serializable or span builder allows setting those values.

Gromcio avatar Apr 08 '21 08:04 Gromcio

Discussed during the issue triage meeting, this is out of the scope of OpenTelemetry project, so we don't intend to define extra APIs /behaviors on how spans should be serialized besides what OTLP already has.

A custom processor seems to be a good solution, another possible option is to create a custom propagator.

reyang avatar Apr 09 '21 16:04 reyang

Please give us feedback. If we have agreement or there is no follow up, we will close the issue next week.

reyang avatar Apr 09 '21 16:04 reyang

Where are specs about spans serialization ? Is this about their propagation between applications (B3, W3C, etc. propagators) ? Those doesn't allow me to continue the context of the span, I can only use them as parent contexts since propagators only create SpanContexts, out of which it's not possible to create active span. Additionaly those propagators save only basic information, no events, attributes which we need to be serialized as well.

Since the only way to create active span is using the SpanBuilder provided by Tracer which disallows providing custom ids, it's not possible to use open telemetry for such cases. Previous team solved it by working solely on zipkin libraries that allow setting ids for built spans, so they converted custom company spans to zipkin spans when we wanted to close the span context.

It would be nicer if I could serialize whole span as string (json?) and recreate it somehow using the opentelemetry api to continue the span context on another node or other thread so that it can be closed. This way i wouldn't need to recreate half of the Span context and it's dependencies just to convert those to their unserializable twin which is handled by sdk and can be sent to the destination service (in our case SignalFx, which is currently pointing at this ot library). Even just allowing spanbuilder to set span context ids would be helpful since then we would be able to give our customers control over how many spans they want to close. Currently we must close the whole stack (parent -> child x n) since we wouldn't be able then to control the id's relationships between spans converted.

Gromcio avatar Apr 09 '21 16:04 Gromcio

@reyang any updates on this topic :) ?

Gromcio avatar May 20 '21 08:05 Gromcio

This is of interest to us in our project as well, as we are needing to do something similar. We need to recreate actual Span and context instances on the other side of the wire as those other services will use them to continue interacting with OpenTelemetry in a seamless fashion. The fact that there is no standard way to ship a span with all its attributes, events, etc across the wire is a surprising omission, considering the novel new way software is being architected, and may force us to use something other than OpenTelemetry.

andrewdep avatar Feb 18 '22 17:02 andrewdep

Hi @Gromcio @andrewdep have you found any effective solution/workaround for this. We too have an exact same requirement of serializing the original span and sending it across, so that the receiver can recreate/reactivate it. We explored and tried all possible combinations to achieve this using opentelemetry but as you mentioned this is kind of a blocker. Look's like we don't have any APIs/behaviors to achieve this.

antim098 avatar Jun 23 '22 19:06 antim098

Sadly I didn't. We've had to completely change our solution for the customers since it was not possible to achieve it the way we wanted. We're now creating and closing spans with each processor thus if we have pipeline of multiple processors each one step creates a new span and propagates data.

Gromcio avatar Jun 23 '22 19:06 Gromcio

Hi @Gromcio @andrewdep have you found any effective solution/workaround for this.

We did find a hacky workaround by using a custom JSON replacer/reviver that also utilized Object.create() and Object.assign(). While this worked, we did end up abandoning it because we didn't like the "code smell" of the hack.

We ended up using inject() and extract() and just got used to seeing every node in the process as its own span, which while not ideal, did serve our purposes.

andrewdep avatar Jun 23 '22 19:06 andrewdep

Hi @Gromcio @andrewdep have you found any effective solution/workaround for this.

We did find a hacky workaround by using a custom JSON replacer/reviver that also utilized Object.create() and Object.assign(). While this worked, we did end up abandoning it because we didn't like the "code smell" of the hack.

We are already using the inject() and extract() to pass around the contexts in String format, but for a specific usecase we require the original span as well. I explored one particular method from the Span interface, fromContext() which can create a span from the Context. But didn't workout for me. This was the closest thing I found for recreating the span. Not sure if you had explored this as well.

https://javadoc.io/static/io.opentelemetry/opentelemetry-api-trace/0.13.1/io/opentelemetry/api/trace/Span.html#fromContext(io.opentelemetry.context.Context)

antim098 avatar Jun 24 '22 03:06 antim098

Have you considered working with the apache nifi community on adding opentelemetry integration into their codebase? I submitted a ticket https://issues.apache.org/jira/browse/NIFI-10110 and am looking into possibilities to see if there's some feasible way to do this

bputt-e avatar Jun 29 '22 20:06 bputt-e

I remember trying something around their provenance but don't remember what was the problem around it. The general idea with provenance was to analyze the provenance events, connect those with flowfile metadata which might contain tracing data and using custom reporting task export it to our collector. But I don't know what exactly stopped us from doing it, i guess it might have been that it provided to little control around what "spans" would be etc. but it's been quite a while since then.

Gromcio avatar Jun 29 '22 21:06 Gromcio

Have you considered working with the apache nifi community on adding opentelemetry integration into their codebase? I submitted a ticket https://issues.apache.org/jira/browse/NIFI-10110 and am looking into possibilities to see if there's some feasible way to do this

It's not about how nifi integrates with it, because if we want to work with 3 nodes as a single "execution" context then we would need to be able to serialize and deserialize span and put it into current span context. But with current implementation of open telemetry it's not possible.

Gromcio avatar Jun 29 '22 21:06 Gromcio

I imagine we'd put the tracecontext into a flowfile attribute that would be exposed to each processor and I thought OTEL would allow us to set the context and continue the trace while in the onTrigger method for example, is that not the case

bputt-e avatar Jun 29 '22 22:06 bputt-e

Same question in our project. We would like to implement a CustomContextStorage as a solution, but it is not easy to serialize/deserialize the Context into/from JSON or something to store it in a database for example.

MarkusAchenbach avatar Jan 10 '23 17:01 MarkusAchenbach