ulid icon indicating copy to clipboard operation
ulid copied to clipboard

strict parsing by default

Open shogo82148 opened this issue 1 year ago • 6 comments

If it is after merging https://github.com/oklog/ulid/pull/116, there is no performance difference between Parse and ParseStrict.

goos: darwin
goarch: arm64
pkg: github.com/oklog/ulid/v2
               │   .old.txt   │              .new.txt               │
               │    sec/op    │   sec/op     vs base                │
Parse-10          9.158n ± 1%   6.051n ± 1%  -33.92% (p=0.000 n=10)
ParseStrict-10   14.515n ± 1%   6.106n ± 1%  -57.94% (p=0.000 n=10)
MustParse-10     10.095n ± 0%   7.076n ± 0%  -29.91% (p=0.000 n=10)
geomean           11.03n        6.394n       -42.03%

The reason for having Parse and ParseStrict separately disappears. Perhaps it would be acceptable to perform strict checks in Parse as well?

shogo82148 avatar Apr 07 '24 13:04 shogo82148

The performance might be the same, but what they do under the hood is different:

Consider if you have the following 2 ulid's:

  • 01J473TRMPVAR70FV1MJV8V5RM
  • 01J473TRMPVAR7OFV1MJV8V5RM --> same ulid, just used the invalid base32 alpha O instead of number 0
ulid.Parse(01J473TRMPVAR70FV1MJV8V5RM) is valid.
ulid.ParseStrict(01J473TRMPVAR70FV1MJV8V5RM) is valid.

ulid.Parse(01J473TRMPVAR7OFV1MJV8V5RM) is valid.
ulid.ParseStrict(01J473TRMPVAR7OFV1MJV8V5RM) is invalid.

Test code: https://go.dev/play/p/4FtDFNoUAuk

TheM1984 avatar Aug 01 '24 14:08 TheM1984

That's right.

However, passing an invalid ULID to ulid.Parse is incorrect usage. Users should pass only verified and trusted ULIDs to ulid.Parse.

I believe this change is useful to prevent such mistakes.

shogo82148 avatar Aug 04 '24 07:08 shogo82148

One issue would be that doing that would require to make a v3, in order to stay Backward Compatibility as normal in the Golang community.

The docs statement of the performance is just a warning that if the strictness is not needed, you should just use the normal Parse. For me the naming gives away that there is a deliberate difference in how they work.

I even doubted to make a bug report that ParseStrict is not working properly. Because looking the base32 specs states:

When decoding, ... i and l will be treated as 1 and o will be treated as 0. 

I never made a bug report, since I think this was done on purpose to fill a need for a stricter approach.

That said maybe one of the maintainers could give some more insight in this?

TheM1984 avatar Aug 05 '24 10:08 TheM1984

However, passing an invalid ULID to ulid.Parse is incorrect usage. Users should pass only verified and trusted ULIDs to ulid.Parse.

Calling ulid.Parse is how users verify if a ULID string is or isn't valid, so an invalid string is perfectly correct usage, you just get an error.

peterbourgon avatar Aug 05 '24 12:08 peterbourgon

@peterbourgon understandable, but @shogo82148 is suggesting that ulid.Parse should behave the same as ulid.ParseStrict since the performance penalty is no longer there.

TheM1984 avatar Aug 05 '24 13:08 TheM1984

It raises broader questions, specifically I think the existing Parse is probably too strict, because it doesn't parse i I l L as 1, or o O as 0, etc.

peterbourgon avatar Aug 05 '24 13:08 peterbourgon