spec icon indicating copy to clipboard operation
spec copied to clipboard

Guarantee a minimum number of IDs before overflow of the random component

Open TomMD opened this issue 6 years ago • 5 comments

The actual number of ULIDs I can get in a millisecond might only be 1 (with absurd probability 2^-80). We could guarantee at least 2^79 ULIDs by requiring the first random generated in a millisecond starts with a zero bit (implementors would merely mask it, I'm sure).

This has the negative impact of only 79 bits of randomness instead of 80 so if there are 1e12ULIDs being generated across different devices in the same millisecond then we'll get a collision whp (vs the previous expectation of 7.8e11 ULIDs in one millisecond)

TomMD avatar Jan 09 '20 23:01 TomMD

I think monotonicity guarantees should completely be removed from the spec as they make the ulid() function impure. This can lead to very unpredictable behavior. (e.g. where different calls to the function accidentally do or don't share the same state and modify their value's in dangerous ways). I'd rather recommend the implementations to provide a second ulidMonoton() function and a function to manually increment the bits of the random part if there is a possibility of non monotonicity. What do you think @alizain?

ad-si avatar Jan 12 '20 21:01 ad-si

@TomMD your proposal seems to have the best balance point considering randomness, easy of implementation and safe range to accomodate successive calls. Thought the same today, came to open a similar issue and then saw yours. +1

Bacco avatar May 08 '20 12:05 Bacco

I thought exactly the same when first reading the spec: if you generate a high random value at the beginning of the millisecond you risk to overflow, and it would be sufficient to just mask the topmost bit from the RNG to avoid this. An UID generator that can fail is a problem for a lot of software, and the loss of that bit solves that problem entirely without significantly impacting the randomness. Plus there are two extra bits never used in the base32 representation. It would also make sense to pad the timestamp on the left (129 or 130 bits) and keep one or two bits to count overflows between RAND and TS if needed. But I tend to think that keeping 128 bits is more important than trying not to lose one bit of entropy.

wtarreau avatar Mar 08 '24 07:03 wtarreau