Feature (misdirected) request: Mini-ulid library?
I'm sorry for the noise, as this is really a request for the creation of a similar but related library, I'm not exactly sure of a better way to do so.
For very many purposes, OTEL TraceIDs, ULIDs (or UUIDv7) are great, and I use this library all over the place.
However, there are a few times when I am constrained to use a different number of bytes (or character representation). Specifically, when I am generating OTEL SpanIDs, I wish there was a nice monotonic, lexicographically sortable equivalent.
See W3C Trace context trace-id
This is the ID of the whole trace forest and is used to uniquely identify a distributed trace through a system. It is represented as a 16-byte array, for example, 4bf92f3577b34da6a3ce929d0e0e4736. All bytes as zero (00000000000000000000000000000000) is considered an invalid value.
See W3C Trace context parent-id
This is the ID of this request as known by the caller (in some tracing systems, this is known as the span-id, where a span is the execution of a client request). It is represented as an 8-byte array, for example, 00f067aa0ba902b7
It would be reasonably straightforward to write a Go module that lets you generate []byte identifiers of arbitrary width, with properties similar to ULIDs. That module would not be this module :) so in any case it wouldn't happen here.
But you almost certainly don't want to take this approach when generating trace or span IDs, for a bunch of reasons, including but in no way limited to the impact on uniqueness and sampling. Can you say a bit more about why you want to do this?
Yeah, I was pretty sure this module was not where this should be implemented, but wondered if there was another reasonable place within the oklog universe. 😅
For context, and as you mentioned, when I am running things in production, OpenTelemetry vendors often want you to use their APIs to acquire traceIDs and spanIDs specifically so they can both guarantee uniqueness, and to control sampling based on external factors.
However, when I am running locally, I often don't want to rely on network communication with my opentelemetry vendor, but instead want to have a light weight self-hosted local solution where I sample everything (or at least, much more) and so I use the stdout exporter with the WithWriter option to write these traces to a local file so I can view it in the otel-desktop-viewer.
For these locally generated TraceIDs, I have been using UUIDv7 (16 byte or 128 bit ) and for the SpanIDs, I would like to use something like a (8 byte or 64-bit) Twitter Snowflake ID. For my use case, I'm not overly concerned with collisions, because for local only use, that's highly unlikely:
+--------------------------------------------------------------------------+
| 1 Bit Unused | 41 Bit Timestamp | 10 Bit NodeID | 12 Bit Sequence ID |
+--------------------------------------------------------------------------+
Using the default settings, that would still allow for 4096 unique IDs to be generated every millisecond, per Node ID.
Makes sense, and I totally understand why you'd want to generate your own trace and/or span IDs without calling out to a third-party library or (especially) service. (For the record, I'm pretty sure using OTel's built-in ID generators doesn't require any network communication, though I may not totally understand your use case.) What I still don't quite understand is: when would you ever care about lexicographical sort order of span IDs? AFAIU a span ID is an opaque identifier that references a span whose metadata necessarily includes a start timestamp + duration. That is, it would only ever be "discovered" thru a parent trace ID or span ID, and it would only ever be used to load/read the corresponding span, and never e.g. compared with other span IDs beyond checking for equality. But I guess this is not true for you?
Huh! I'll have to try this out...
BTW, You might also be interested in retrace and trot and how Jon Johnson uses otel.md idiosyncratically, in a radically local way.
when would you ever care about lexicographical sort order of span IDs? AFAIU a span ID is an opaque identifier that references a span whose metadata necessarily includes a start timestamp + duration.
TraceIDs + SpanIDs get passed around all over the place. In an event-oriented flow, or a couple of microservices, being able to see what came first is a big help in reasoning about a situation.
SpanIDs being only 64-bit may require generation coordination or risk collisions, unlike 128-bit TraceIDs using ULID/UUIDv7.
I could see utility even in non-local, production Span IDs having lexicographical sort order (and inherent temporal information) if they were guaranteed to be unique for those Span IDs that were not sampled. For these Span IDs, there is no way to look up the span reference to discover the start timestamp.
If I have just IDs, if an ID was generated with a temporal component, by implication, I also have that ID's start time without having to do a lookup for that associated metadata.
In local dev, I have found this property of my UUIDv7-generated Trace IDs very useful (e.g., I can filter large numbers of Trace IDs by their start time for a relevant time range).
I'm honestly only speculating that I would take similar advantage of having extra temporal information in Span IDs without resorting to a local OLAP store like ClickHouse or a time series database like InfluxDB.
(I'm completely into the concept of local-only tracing! peterbourgon/trc is a self-described "in-process request tracing" project which is very much adjacent to, if not totally compatible with, the stuff you've linked above.)
TraceIDs + SpanIDs get passed around all over the place. In an event-oriented flow, or a couple of microservices, being able to see what came first is a big help in reasoning about a situation. ... I'm honestly only speculating that I would take similar advantage of having extra temporal information in Span IDs without resorting to a local OLAP store like ClickHouse or a time series database like InfluxDB.
So, my concerns here might be best summarized with the following example: a span ID that embeds some timestamp t1, but which refers to a span with an actual, true, correct start_time attribute that is some totally different timestamp t2. This is always gonna be a possible outcome, and can't just be ignored as an edge case you don't need to worry about. Which means whatever timestamp you infer from the span ID is, necessarily, just a guess, not a fact -- fine as an optimization, but if you want to make reliable/correct decisions about start_times then you must load the actual span entity, you can't just rely on the span ID. Right?
More generally: if you pass around trace IDs and/or span IDs, that's fine, but you can't expect to make any decisions about them, or the entities they refer to, without first hydrating those entities from some source of truth, just by definition. Right?
(I might be misunderstanding something! Please correct me if so!)
In local dev, I have found this property of my UUIDv7-generated Trace IDs very useful (e.g., I can filter large numbers of Trace IDs by their start time for a relevant time range).
If you wanted to filter traces by time range, you wouldn't ever filter "trace IDs" directly, you'd filter against the actual source-of-truth of trace/span data, which would include timestamps and IDs among many different properties of all known trace/span data...
I'm honestly only speculating that I would take similar advantage of having extra temporal information in Span IDs without resorting to a local OLAP store like ClickHouse or a time series database like InfluxDB.
You don't need an OLAP store, or a time series DB, or anything in particular, to look up details about trace and/or span IDs, but you absolutely need some kind of source-of-truth for trace metadata, which you can query, with these kinds of filters, to get trace/span data. This is AFAIK unavoidable?
peterbourgon/trc is a self-described "in-process request tracing" project which is very much adjacent to, if not totally compatible with, the stuff you've linked above.)
🤯 So cool!
So, my concerns here might be best summarized with the following example: a span ID that embeds some timestamp t1, but which refers to a span with an actual, true, correct start_time attribute that is some totally different timestamp t2. This is always gonna be a possible outcome, and can't just be ignored as an edge case you don't need to worry about. Which means whatever timestamp you infer from the span ID is, necessarily, just a guess, not a fact -- fine as an optimization, but if you want to make reliable/correct decisions about start_times then you must load the actual span entity, you can't just rely on the span ID. Right?
Oh, I see the confusion. I only meant when the source-of-truth that generates both the SpanID and the Span itself reuses guaranteed identical timestamps for each.
BTW, I'm heading into a national forest for a week, so I'm going offline from here on out for a bit.