zipkin-ruby icon indicating copy to clipboard operation
zipkin-ruby copied to clipboard

Don't report span.timestamp or duration when you didn't create the span

Open codefromthecrypt opened this issue 9 years ago • 5 comments

Zipkin's RPC span is interesting because it is a shared model. For example, a client creates a span and then the next hop (a server) adds annotations to it. The authoritative owner of the span is the one that's supposed to set timestamp and duration. In the case where it is originated by a client, the client, not the server, should set timestamp and duration.

See https://github.com/openzipkin/openzipkin.github.io/issues/49

Why bother?

If we special-case timestamp and duration reporting, a couple things happen:

  • zipkin duration queries will be accurate because the critical path duration will be the only duration queried for the span
  • Those externally controlling ids will be able to use the duration query
  • Those converting zipkin RPC spans into single-host spans, like Google StackDriver, will have a heuristic to tell if a span is dual host or not.

Do we need to do anything?

I'm not sure if the current implementation is doing this fine or not, but it is probably useful to add tests to make sure it it. Usually, there's very little to do to special-case this, and it always is where B3 headers are read.

Here are tests that should smoke it out

  • client creates a span and propagates it to a server

    • client propagates b3 headers for trace id, span id, and sampled=1
    • server neither reports span.timestamp nor duration
    • client reports span.timestamp and duration
  • server creates a new trace

    • no b3 headers are detected
    • server reports span.timestamp and duration
  • server creates a new trace with externally supplied ids

    • caller propagates b3 headers for trace id, span id, but no sampled header
    • server reports span.timestamp and duration

Does this make sense? Can anyone help translate this into tests for us?

See https://github.com/openzipkin/brave/issues/277#issuecomment-262429392

codefromthecrypt avatar Nov 28 '16 08:11 codefromthecrypt

ps mostly unrelated, but in some other implementations, when we start a span, we log a tick (or otherwise start a timer). When this is set (ex it wouldn't be set if it were a continued server span), we extract a measurement in microseconds as duration. Using this is more reliable than system time, as it never risks moving backwards during a span (ex from NTP adjustments). Not sure best mechanism in ruby, but I've found http://ruby-doc.org/stdlib-2.3.3/libdoc/benchmark/rdoc/Benchmark.html and https://github.com/copiousfreetime/hitimes

codefromthecrypt avatar Nov 28 '16 10:11 codefromthecrypt

I think the third one would not pass and require several changes. The first two it are easy, you got the ids , someone else is owning this, else you own this. The third is weird, "the lack of this optional sampled header in fact means you own this", weird. Current code will not like this

jcarres-mdsol avatar Nov 28 '16 19:11 jcarres-mdsol

The third is weird, "tha lack of this optional sampled header in fact means you own this", weird. Current code will not like this

well, if you think about it, it makes sense. You'd never propagate sampled=1 unless you started the span. lack of a sampled header means "defer sampling decision downstream" which means it hasn't been sampled or reported yet. anyway, we can leave out part 3 for now, just it will eventually come up. when it does, this is how it would be handled.

codefromthecrypt avatar Nov 29 '16 01:11 codefromthecrypt

in other words the sampled header is indeed required for any span that the caller reports to zipkin. I can fix the b3 docs as it seems unclear.

codefromthecrypt avatar Nov 29 '16 01:11 codefromthecrypt

actually it is already there.. https://github.com/openzipkin/b3-propagation#details

codefromthecrypt avatar Nov 29 '16 01:11 codefromthecrypt