Treat Regular Expression as in UTF-32
如下面所示,將 - xform/a|ɔ|œ|ɛ|i|u|y/$&ː/ 改成 - xform/[aɔœɛiuy]/$&ː/ 或者 - xform/([aɔœɛiuy])/$1ː/ 都會使程式不能正常運作。(按:aɔœɛiuy 皆位於 Unicode 第0平面)
https://github.com/rime/rime-cantonese/blob/3ee577383e42f74a16e6abc48b6fecfc245c4855/jyut6ping3_ipa.schema.yaml#L55
文檔是UTF-8,[]裡的東西只能按Utf-8解讀。
啊我明白了,謝謝,雖然把 regex 當成 UTF-32 解讀會跟方便。 (本人用慣 JavaScript 所以才會以為是 UTF-16……)
So I propose that a regular expression should be treated as in a UTF-32 environment, just like xlit. This is much more convenient to deal with.
You can make a pull if you think it's important to you. I don't think this matters.
Where could I change them?
I don’t know, but modifying the code is for sure
用UTF-8編碼的C語言轉寫就可以了:
xform/([a\xC9\x94\xC5\x93\xc9\x98iuy])/$1ː/
另外,IPA的長音符是ː(\xCB\x90)而非ː
那是人體工學的問題:有人會喜歡在自己的方案看到一堆逸出字元(轉義字符)嗎? 還是別爭論了,我有空自己 PR
另外,IPA的長音符是
ː(\xCB\x90)而非ː
我不見得有甚麼問題,兩者也是 U+02D0 MODIFIER LETTER TRIANGULAR COLON
(這個 issue 太久了,回想以前對 librime 的代碼一無所知……)
那是人體工學的問題:有人會喜歡在自己的方案看到一堆逸出字元(轉義字符)嗎? 還是別爭論了,我有空自己 PR
另外,IPA的長音符是
ː(\xCB\x90)而非ː我不見得有甚麼問題,兩者也是 U+02D0 MODIFIER LETTER TRIANGULAR COLON
啊,字體顯示得不太像,導致我看錯了