ulid icon indicating copy to clipboard operation
ulid copied to clipboard

ULIDs contains `ILOU` will be parsed as weird timestamps

Open kachick opened this issue 4 years ago • 5 comments

Hi! I'm writing a new Ruby library for handling ULID in these days. Now I’m testing other implementations examples in https://github.com/kachick/ruby-ulid/issues/53.

And I have found weird examples in original repository as https://github.com/ulid/javascript/pull/85.

And then checked the parser of this library, because I'm using this in a Go project, it is so useful! šŸ˜„

Using this command line tool as below, the version is https://github.com/oklog/ulid/tree/e7ac4de44d238ff4707cc84b9c98ae471f31e2d1

$ ulid -h
Usage: ulid [-hlqz] [-f <format>] [parameters ...]
 -f, --format=<format>  when parsing, show times in this format: default, rfc3339, unix, ms
 -h, --help             print this help text
 -l, --local            when parsing, show local time instead of UTC
 -q, --quick            when generating, use non-crypto-grade entropy
 -z, --zero             when generating, fix entropy to all-zeroes

$ ulid 01111111111111111111111111
Mon Dec 19 08:09:04.801 UTC 2005

$ ulid 0LLLLLLLLLLLLLLLLLLLLLLLLL # `L` is same as `1` in https://www.crockford.com/base32.html, but returned different value
Tue Aug 02 05:31:50.655 UTC 10889

$ ulid 0UUUUUUUUUUUUUUUUUUUUUUUUU # `U` is invalid in https://www.crockford.com/base32.html, but does not raise error
Tue Aug 02 05:31:50.655 UTC 10889

$ ulid 00000000000000000000000000
Thu Jan 01 00:00:00 UTC 1970

$ ulid 0OOOOOOOOOOOOOOOOOOOOOOOOO # `O` is same as `0` in https://www.crockford.com/base32.html, but returned different value
Tue Aug 02 05:31:50.655 UTC 10889

In my understanding, Crockford's base32 does not contain L I O for the encoded product. So I think ULID can handle them as invalid values šŸ¤” ref: https://github.com/ulid/spec/issues/38, https://github.com/kachick/ruby-ulid/issues/57

kachick avatar May 01 '21 13:05 kachick

Interesting. I think the relevant rules are

When decoding, upper and lower case letters are accepted, and i and l will be treated as 1 and o will be treated as 0. When encoding, only upper case letters are used.

Hyphens (-) can be inserted into symbol strings. This can partition a string into manageable pieces, improving readability by helping to prevent confusion. Hyphens are ignored during decoding.

I think we are not doing the bold parts.

peterbourgon avatar May 03 '21 13:05 peterbourgon

@tsenart Think we can add those things?

peterbourgon avatar May 03 '21 13:05 peterbourgon

Thanks for your comment!

I think we are not doing the bold parts.

Agreed, and I think ignoring them is the desirable spec for actual use-case, rather than strict following original Crockford's base32. šŸ˜…

So I have suggested it in https://github.com/ulid/spec/pull/57 šŸ™

kachick avatar May 03 '21 18:05 kachick

Ah, yes, and to just make it explicit, you wrote

Especially when [implementations] accept [the] iIlLoO mapping, as [is suggested in the] original Crockford's base32 decoding spec, Lexicographically sortable is lost

which is a great point šŸ‘ Will wait for the outcome of that other PR...

peterbourgon avatar May 03 '21 18:05 peterbourgon