`collate` gives different results than applying `compare` on `sortKey`
ghci> import qualified Data.Text.ICU as ICU
ghci> let testCompare c a b = (ICU.collate c a b, compare (ICU.sortKey c a) (ICU.sortKey c b))
according to the docs, testCompare c a b should always return a pair of two equal values (i.e. (EQ, EQ), (LT, LT) or (GT, GT)). But this isn't the case, for example:
ghci> let c = ICU.collator ICU.Root
ghci> testCompare c "" "\EOT"
(EQ,LT)
ghci> testCompare c "" "\ETX"
(EQ,LT)
ghci> testCompare c "" "\NUL"
(EQ,LT)
ghci> testCompare c "" "\2205"
(EQ,LT)
ghci> testCompare c "" "\2250"
(EQ,LT)
ghci> testCompare c "" "\2250\ETX\2205"
(EQ,LT)
As far as I can tell, there are a handful of characters (including all of those above) such that Data.ByteString.unpack $ ICU.sortKey "(char)" gives [1, 1, 0]. And the problem manifests when we compare a string of any number of these characters (such a string also has sort key [1, 1, 0]) to the empty string (sort key []). I haven't seen this in any other situation.
(\2250 is U+08ca "arabic small high farsi yeh" and \2205 is "arabic superscripet alef mokhassas". Found these essentially randomly. A few others in the vicinity have the same property, like \2251 but not \2206. I haven't looked to see if there's any pattern here.)
I tried a few other collators. collatorWith _ [Strength Secondary] makes the sort key of the non-empty strings [1, 0] instead of [1, 1, 0], but testCompare gives the same results. Changing the base to Locale "en" or adding Numeric True doesn't obviously make a difference.
This is with text-icu-0.8.0.2. I can't rule out that this is a bug in icu itself. I'm not familiar enough with C to be able to test that easily, though I expect I could figure it out. I'm using a version provided by nix. Based on the output of lsof, it seems to be version 72.1: my running GHC is has these files open:
/nix/store/x6cq3940a5krcwj0p28y3b6lckxmcfqw-icu4c-72.1/lib/libicudata.so.72.1
/nix/store/x6cq3940a5krcwj0p28y3b6lckxmcfqw-icu4c-72.1/lib/libicui18n.so.72.1
/nix/store/x6cq3940a5krcwj0p28y3b6lckxmcfqw-icu4c-72.1/lib/libicuuc.so.72.1