Incoherent span timestamp on child_of relationship with Jaeger
Please answer these questions before submitting a bug report.
What version of OpenCensus are you using?
0.19.2
What JVM are you using (java -version)?
openjdk version "1.8.0_171" OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-1~deb9u1-b11) OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)
What did you do?
I am tracing some stream processing where a computation A produces a record that is consumed by another computation B, the processing in A and B are traced with a span and the trace context is propagated in the record.
Also, there are annotations in spans with a message containing the current time when the record is send and receive like
span.addAnnotation("send record done " + Instant.now().toString());
What did you expect to see?
Because there is a causality between A and B, B must not start before A emits the record. Also, the annotations (reported as logs in Jaeger) should be inside the span and match very closely the annotation timestamp (logs timestamp).
What did you see instead?
In some cases, Jaeger displays an overlap of computation A and computation B where B starts before A has written the record. The time annotation in B are outside of the span time range and they are not matching the logs timestamp. The annotation message time seems right because they express the causality
Extract of trace from Computation A
{
"traceID": "f758186957b4511b5fa0a77d06081ab8",
"spanID": "621b7ced0aacb159",
"operationName": "comp.sortBlob",
"references": [
{
"refType": "CHILD_OF",
"traceID": "f758186957b4511b5fa0a77d06081ab8",
"spanID": "9da70fc9c0f69850"
}
],
"startTime": 1550248601411000, // Fr 15 Feb 2019 17:36:41.411000 CET
"duration": 486115, // or 1550248601897115 -> 17:36:41.897115
"logs": [
...
{
"timestamp": 1550248601895735, // 16:36:41.895735 this match the message value
"fields": [
{
"key": "message",
"type": "string",
"value": "send record done 2019-02-15T16:36:41.896Z"
}
]
...
}
Extract of trace from Computation B
{
"traceID": "f758186957b4511b5fa0a77d06081ab8",
"spanID": "715b90f36cb48636",
"flags": 1,
"operationName": "comp.exposeBlob",
"references": [
{
"refType": "CHILD_OF",
"traceID": "f758186957b4511b5fa0a77d06081ab8",
"spanID": "621b7ced0aacb159"
}
],
"startTime": 1550248601646466, // 17:36:41.646466 this is before the record has been send
"duration": 15182, // 17:36:41.661648
"logs": [
...
{
"timestamp": 1550248601647394, // 17:36:41.647394 this is impossible the record has been sent later 16:36:41.895735
"fields": [
{
"key": "message",
"type": "string",
"value": "read record 2019-02-15T16:36:41.933Z" // this sound the correct time but outside the span range
}
]
}
...
Everything is running on the same machine with a docker compose so there should be no time desynchronisation.
Can this be related to a context propagation problem ?
After more investigation, by logging span's timestamp sent by open census, they are correct, but Jaeger renders the span with wrong times (like if the child_of relation means in the middle of the parent span). Furthermore, I have tried to use Zipkin and I cannot reproduce the problem. The problem must be in the Jaeger side.
I guess my problem is
either related to the following Jaeger in-memory bug:
https://github.com/jaegertracing/jaeger/issues/787
either because the span relationship is translated into childOf instead followsFrom
So according to Jaeger such relationship between span must be of type followsFrom,
I have a working solution with this code:
spanContext = binaryFormat.fromByteArray(traceContext);
Link link = Link.fromSpanContext(spanContext, Link.Type.PARENT_LINKED_SPAN);
Span span = tracer.spanBuilderWithRemoteParent(mySpanName, spanContext).startSpan();
span.addLink(link);
It works for Jaeger and Zipkin but it looks over complicated for such common pattern...