librime icon indicating copy to clipboard operation
librime copied to clipboard

Treat Regular Expression as in UTF-32

Open graphemecluster opened this issue 4 years ago • 10 comments

如下面所示,將 - xform/a|ɔ|œ|ɛ|i|u|y/$&ː/ 改成 - xform/[aɔœɛiuy]/$&ː/ 或者 - xform/([aɔœɛiuy])/$1ː/ 都會使程式不能正常運作。(按:aɔœɛiuy 皆位於 Unicode 第0平面) https://github.com/rime/rime-cantonese/blob/3ee577383e42f74a16e6abc48b6fecfc245c4855/jyut6ping3_ipa.schema.yaml#L55

graphemecluster avatar Apr 06 '21 15:04 graphemecluster

文檔是UTF-8,[]裡的東西只能按Utf-8解讀。

LEOYoon-Tsaw avatar Apr 06 '21 15:04 LEOYoon-Tsaw

啊我明白了,謝謝,雖然把 regex 當成 UTF-32 解讀會跟方便。 (本人用慣 JavaScript 所以才會以為是 UTF-16……)

graphemecluster avatar Apr 06 '21 15:04 graphemecluster

So I propose that a regular expression should be treated as in a UTF-32 environment, just like xlit. This is much more convenient to deal with.

graphemecluster avatar Apr 20 '21 19:04 graphemecluster

You can make a pull if you think it's important to you. I don't think this matters.

LEOYoon-Tsaw avatar Apr 20 '21 20:04 LEOYoon-Tsaw

Where could I change them?

graphemecluster avatar Apr 20 '21 20:04 graphemecluster

I don’t know, but modifying the code is for sure

LEOYoon-Tsaw avatar Apr 20 '21 23:04 LEOYoon-Tsaw

用UTF-8編碼的C語言轉寫就可以了: xform/([a\xC9\x94\xC5\x93\xc9\x98iuy])/$1ː/ 另外,IPA的長音符是ː(\xCB\x90)而非ː

groverlynn avatar Oct 17 '23 06:10 groverlynn

那是人體工學的問題:有人會喜歡在自己的方案看到一堆逸出字元(轉義字符)嗎? 還是別爭論了,我有空自己 PR

另外,IPA的長音符是ː(\xCB\x90)而非ː

我不見得有甚麼問題,兩者也是 U+02D0 MODIFIER LETTER TRIANGULAR COLON

graphemecluster avatar Oct 17 '23 14:10 graphemecluster

(這個 issue 太久了,回想以前對 librime 的代碼一無所知……)

graphemecluster avatar Oct 17 '23 14:10 graphemecluster

那是人體工學的問題:有人會喜歡在自己的方案看到一堆逸出字元(轉義字符)嗎? 還是別爭論了,我有空自己 PR

另外,IPA的長音符是ː(\xCB\x90)而非ː

我不見得有甚麼問題,兩者也是 U+02D0 MODIFIER LETTER TRIANGULAR COLON

啊,字體顯示得不太像,導致我看錯了

groverlynn avatar Oct 20 '23 17:10 groverlynn