spec icon indicating copy to clipboard operation
spec copied to clipboard

unicode normalization

Open pwinckles opened this issue 4 years ago • 1 comments

I was thinking about unicode normalization again. I know last time this was discussed, perhaps it was on Slack, that normalization was considered outside of the scope of the spec. However, I had a couple of additional thoughts after seeing that the BagIt spec spends time describing the normalization problem and then recommends that implementations tolerate differences in normalization and warn when there are files that differ by normal form only.

  1. Perhaps, it would make sense if OCFL validators produced warnings if there are files or object ids that only differ based on how they are normalized?
  2. Should the spec make any similar recommendations, perhaps in the implementation notes, about tolerating differences in normalization forms? Or is this not desirable behavior?
  3. The spec states "Each version block in each prior inventory file MUST represent the same object state as the corresponding version block in the current inventory file." In case of logical paths, is it up to the implementation to decide if this is a byte-for-byte comparison or a normalized comparison? (Edit: noting that digest algorithm changes are supported between versions.)

pwinckles avatar Aug 20 '21 15:08 pwinckles

We think discussion of this issue would be best in the implementation notes and are deferring to 2.0 because of the complications related to it.

rosy1280 avatar Feb 03 '22 16:02 rosy1280