joni
joni copied to clipboard
bugfix: char class casefold for certain chars
When a character is less than or equal to single byte size (0xff), yet it takes more than 1 byte in the current encoding, the case folding code incorrectly put it in bitset instead of code range. As a result, for utf8 encoding, casefold works incorrectly on characters in range \u0080 to \u00ff (latin1 supplement).
Before fix:
-
"\u00c2"[\u00e0-\u00e5]returns false -
"\u00c2"[\u00e2]returns false -
"\u00c2"\u00e2returns true