Question Regarding Combined Characters and Regex

Open AlanBurkhart opened this issue 3 years ago • 0 comments

I have my own Regex find-replace dialog that's always worked pretty well. Except if a text document contains characters with more than one Unicode code point, it throws off the index of the match. One character position per combined character. In this case I wasn't searching for the offending character but rather specific text that came after. For example:

🕜 &#.128348; &#.x1F55C; Clock Face One-thirty

Searching for the ampersand matches the # sign. If I paste another clock face chr into the line, it'll match the "1". Is there a practical method for dealing with this? (dots inserted so entities displayed instead of characters)

Jun 12 '22 15:06 AlanBurkhart