librime Treat Regular Expression as in UTF-32

如下面所示，將 - xform/a|ɔ|œ|ɛ|i|u|y/$&ː/ 改成 - xform/[aɔœɛiuy]/$&ː/ 或者 - xform/([aɔœɛiuy])/$1ː/ 都會使程式不能正常運作。（按：aɔœɛiuy 皆位於 Unicode 第０平面） https://github.com/rime/rime-cantonese/blob/3ee577383e42f74a16e6abc48b6fecfc245c4855/jyut6ping3_ipa.schema.yaml#L55

Apr 06 '21 15:04 graphemecluster

文檔是UTF-8，[]裡的東西只能按Utf-8解讀。

Apr 06 '21 15:04 LEOYoon-Tsaw

啊我明白了，謝謝，雖然把 regex 當成 UTF-32 解讀會跟方便。（本人用慣 JavaScript 所以才會以為是 UTF-16……）

Apr 06 '21 15:04 graphemecluster

So I propose that a regular expression should be treated as in a UTF-32 environment, just like xlit. This is much more convenient to deal with.

Apr 20 '21 19:04 graphemecluster

You can make a pull if you think it's important to you. I don't think this matters.

Apr 20 '21 20:04 LEOYoon-Tsaw

Where could I change them?

Apr 20 '21 20:04 graphemecluster

I don’t know, but modifying the code is for sure

Apr 20 '21 23:04 LEOYoon-Tsaw

用UTF-8編碼的C語言轉寫就可以了： xform/([a\xC9\x94\xC5\x93\xc9\x98iuy])/$1ː/ 另外，IPA的長音符是ː（\xCB\x90）而非ː

Oct 17 '23 06:10 groverlynn

那是人體工學的問題：有人會喜歡在自己的方案看到一堆逸出字元（轉義字符）嗎？還是別爭論了，我有空自己 PR

另外，IPA的長音符是ː（\xCB\x90）而非ː

我不見得有甚麼問題，兩者也是 U+02D0 MODIFIER LETTER TRIANGULAR COLON

Oct 17 '23 14:10 graphemecluster

（這個 issue 太久了，回想以前對 librime 的代碼一無所知……）

Oct 17 '23 14:10 graphemecluster

那是人體工學的問題：有人會喜歡在自己的方案看到一堆逸出字元（轉義字符）嗎？還是別爭論了，我有空自己 PR

另外，IPA的長音符是ː（\xCB\x90）而非ː

我不見得有甚麼問題，兩者也是 U+02D0 MODIFIER LETTER TRIANGULAR COLON

啊，字體顯示得不太像，導致我看錯了

Oct 20 '23 17:10 groverlynn