opencensus-java icon indicating copy to clipboard operation
opencensus-java copied to clipboard

Incoherent span timestamp on child_of relationship with Jaeger

Open bdelbosc opened this issue 6 years ago • 3 comments

Please answer these questions before submitting a bug report.

What version of OpenCensus are you using?

0.19.2

What JVM are you using (java -version)?

openjdk version "1.8.0_171" OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-1~deb9u1-b11) OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

What did you do?

I am tracing some stream processing where a computation A produces a record that is consumed by another computation B, the processing in A and B are traced with a span and the trace context is propagated in the record. Also, there are annotations in spans with a message containing the current time when the record is send and receive like span.addAnnotation("send record done " + Instant.now().toString());

What did you expect to see?

Because there is a causality between A and B, B must not start before A emits the record. Also, the annotations (reported as logs in Jaeger) should be inside the span and match very closely the annotation timestamp (logs timestamp).

What did you see instead?

In some cases, Jaeger displays an overlap of computation A and computation B where B starts before A has written the record. The time annotation in B are outside of the span time range and they are not matching the logs timestamp. The annotation message time seems right because they express the causality

Extract of trace from Computation A

{
 "traceID": "f758186957b4511b5fa0a77d06081ab8",
 "spanID": "621b7ced0aacb159",
 "operationName": "comp.sortBlob",
 "references": [
  {
   "refType": "CHILD_OF",
   "traceID": "f758186957b4511b5fa0a77d06081ab8",
   "spanID": "9da70fc9c0f69850"
  }
 ],
 "startTime": 1550248601411000, // Fr 15 Feb 2019 17:36:41.411000 CET
 "duration": 486115, // or 1550248601897115 ->    17:36:41.897115
 "logs": [
...
   {
    "timestamp": 1550248601895735, // 16:36:41.895735 this match the message value
    "fields": [
     {
       "key": "message",
       "type": "string",
       "value": "send record done 2019-02-15T16:36:41.896Z"
     }
   ]
...
}

Extract of trace from Computation B

{
 "traceID": "f758186957b4511b5fa0a77d06081ab8",
 "spanID": "715b90f36cb48636",
 "flags": 1,
 "operationName": "comp.exposeBlob",
  "references": [
   {
    "refType": "CHILD_OF",
    "traceID": "f758186957b4511b5fa0a77d06081ab8",
    "spanID": "621b7ced0aacb159"
   }
 ],
 "startTime": 1550248601646466, // 17:36:41.646466 this is before the record has been send
 "duration": 15182,             // 17:36:41.661648
 "logs": [
 ...
 {
  "timestamp": 1550248601647394, // 17:36:41.647394 this is impossible the record has been sent later 16:36:41.895735
  "fields": [
   {
     "key": "message",
     "type": "string",
     "value": "read record 2019-02-15T16:36:41.933Z" // this sound the correct time but outside the span range
   }
  ]
}
...

Everything is running on the same machine with a docker compose so there should be no time desynchronisation.

Can this be related to a context propagation problem ?

bdelbosc avatar Feb 15 '19 17:02 bdelbosc

After more investigation, by logging span's timestamp sent by open census, they are correct, but Jaeger renders the span with wrong times (like if the child_of relation means in the middle of the parent span). Furthermore, I have tried to use Zipkin and I cannot reproduce the problem. The problem must be in the Jaeger side.

bdelbosc avatar Feb 18 '19 09:02 bdelbosc

I guess my problem is either related to the following Jaeger in-memory bug: https://github.com/jaegertracing/jaeger/issues/787 either because the span relationship is translated into childOf instead followsFrom

bdelbosc avatar Feb 18 '19 09:02 bdelbosc

So according to Jaeger such relationship between span must be of type followsFrom, I have a working solution with this code:

            spanContext = binaryFormat.fromByteArray(traceContext);
            Link link = Link.fromSpanContext(spanContext, Link.Type.PARENT_LINKED_SPAN);
            Span span = tracer.spanBuilderWithRemoteParent(mySpanName, spanContext).startSpan();
            span.addLink(link);

It works for Jaeger and Zipkin but it looks over complicated for such common pattern...

bdelbosc avatar Feb 18 '19 16:02 bdelbosc