bugfix: char class casefold for certain chars

Open haozhun opened this issue 10 years ago • 0 comments

When a character is less than or equal to single byte size (0xff), yet it takes more than 1 byte in the current encoding, the case folding code incorrectly put it in bitset instead of code range. As a result, for utf8 encoding, casefold works incorrectly on characters in range \u0080 to \u00ff (latin1 supplement).

Before fix:

"\u00c2" [\u00e0-\u00e5] returns false
"\u00c2" [\u00e2] returns false
"\u00c2" \u00e2 returns true

Mar 20 '15 19:03 haozhun