nvda Braille: Variation Selectors break cursor positions

Steps to reproduce:

Consider the following line:

⚠️ test

Move the cursor on the word (test) using routing cursors.

Actual behavior:

The cursor is moved on — the desired position + 1 —. Also ⚠️ takes 2 braille cells.

Expected behavior:

The cursor should be moved on the desired position. Also ⚠️ symbol should takes one cell only. With wordpad and in form mode in Firefox, cursor is properly moved.

System configuration

NVDA installed/portable/running from source:

Installed and portable

NVDA version:

2019.3, 2020.1

Windows version:

10 Insider (64-bit) build 19592.1001

Name and version of other software in use when reproducing the issue:

notepad, Microsoft Word, Firefox (in browse mode), Run dialog (windows+r), etc..

Other information about your system:

Other questions

Does the issue still occur after restarting your computer?

Yes

Have you tried any other versions of NVDA? If so, please report their behaviors.

No

If addons are disabled, is your problem still occuring?

Yes

Did you try to run the COM registry fixing tool in NVDA menu / tools?

No

CC @leonardder

Apr 08 '20 09:04 AAClause

Could you test the behavior with this in NVDA 2019.2 please?

Apr 08 '20 11:04 LeonarddeR

More details (according to my modest research): ⚠️ is composed of 2 characters: \x26a0 (⚠) and \xfe0f. The issue comes from \xfe0f (variation selector-16). This character belongs to variation selectors Unicode block. It seems that all characters in this block are invisible codepoint which specify that the preceding character should be displayed with another presentation/color. Also it seems that we shouldn't reach these characters with keyboard (arrow keys, etc.). In the end, these are all characters in this range (FE00 to FE0F) that break cursor positions. Another example: ⚠︀ = \x26A0 + \xfe00. With NVDA 2019.2, we can move cursor on variation selectors. Therefore cursor positions are not broken.

Apr 08 '20 17:04 AAClause

This is a difficult one. A quick solution would be: strip variation selectors from braille output. Not something I really like, as basically we're removing output that might be considered relevant for some people.

Thoughts @michaelDCurran @dkager @Adriani90 @lukaszgo1?

Apr 09 '20 09:04 LeonarddeR

It seems that there are other signs more complexes. E.g.: 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣ 7️⃣ 8️⃣ 9️⃣ 0️⃣ *️⃣ #️⃣ 3 braille cells per signs including U+FE0F and U+20E3 (variation selector-16) and combining enclosing keycap).

May 07 '20 15:05 AAClause

I've just made a first attempt in Braille Extender add-on, there's probably a better solution. See https://github.com/Andre9642/BrailleExtender/pull/63. Doesn't work in wordpad and in Firefox (in form mode).

Jun 01 '20 22:06 AAClause

I'm pretty sure that in https://github.com/nvaccess/nvda/pull/16219, @mltony laid the groundwork to fix this. Braille cursor movement should start relying on code points rather than characters. As liblouis is using 32 bit internally, that will always match.

May 02 '24 09:05 LeonarddeR

Unfortunately, my work had to be reverted due to an oversight in the code for UIA in Word. I'll try to fix this later this week.

May 14 '24 05:05 LeonarddeR

The complexity here lies in the bullets and numbers part added in https://github.com/nvaccess/nvda/pull/8576

In this pr, @michaelDCurran changed bullet and number handling by adding the bullet/number to the line-prefix field on the format field before the text string. There are several problems with that approach:

It only works reliably when the line has text after the bullet. If not, the bullet is exposed as text anyway
It doesn't affect getting just the text of the textInfo, so when getting the text property on the TextInfo instance, the bullet is still exposed in text.

When the text in fields and the text without fields don't match, it is impossible to tell moveToUnicodePointOffset where to move to when routing braille.

May 16 '24 17:05 LeonarddeR

@LeonarddeR any update on this?

Jul 17 '24 02:07 SaschaCowley

I think we need feedback from @michaelDCurran on this, particularly https://github.com/nvaccess/nvda/issues/10960#issuecomment-2115810057

Jul 17 '24 06:07 LeonarddeR

If we just expose the bullet / number as normal text, then cursor routing is completely broken as the bullet / number plus the first character are all at character offset 0. We could strip bullets / numbers from textInfo.text for MS Word UIA, which would ensure that textWithFields / text with out fields are equivalent.
The quickest way to do this is to generate textInfo.text from getTextWithFields and just strip the controlFields / formatFields. Does seem a bit costly though. But I'm not too sure how we could accurately strip them otherwise. We need to know they are semantically bullets / numbers as opposed to them having been literally typed. Or is there now another way we can intercept cursor routing character offsets as they are converted to move calls? We could add a new unit across NVDA specifically for character routing offsets, generally mapping them to character, but specifically handling them different in MS Word UIA when crossing a bullet / number? Apologies if I have missed something more modern here, I have not looked at braille code for several years :)

Jul 17 '24 07:07 michaelDCurran

Thanks @michaelDCurran

We could strip bullets / numbers from textInfo.text for MS Word UIA, which would ensure that textWithFields / text with out fields are equivalent. The quickest way to do this is to generate textInfo.text from getTextWithFields and just strip the controlFields / formatFields.

I have prototype this.

Does seem a bit costly though.

I'm afraid that's understated. Only doing a select all on 8000 characters already takes around 2.5 seconds on my end.

But I'm not too sure how we could accurately strip them otherwise.

I'm afraid we can't, but as said above, even the current approach isn't really accurate. For example, it only sets the line-prefix for the current line, so the approach doesn't work for multiple lines containing bullets.

Or is there now another way we can intercept cursor routing character offsets as they are converted to move calls?

As the intention is for routing to use moveToCodePointOffset, I guess we should avoid overriding _get_text but rather provide a specific override for moveToCodePointOffset

Jul 17 '24 13:07 LeonarddeR

I am a bit late to the discussion here, but here is my $0.02. If I understand this thread correctly, the problem is when moving by character in bulleted list in MS Word, one textInfos.UNIT_CHARACTER corresponds to three character in python string: bullet, space and the actual first character of the line. Thus we can't really move to the beginning of bullet list item using moveToCodepointOffset() function. Here is how I would solve this:

Modify moveToCodepointOffset so that when it fails to find the desired offset, it somehow returns the next and previous offset it can find in the string. For example maybe as a field in the exception it throws. Or maybe we can add an extra parameter to the function saying what should be the behavior if desired offset is not found.
When routing braille cursor we fall back to previous findable offset if current offset cannot be found. This way we won't have to tweak getTextWithFields behavior - as this might cause unpredictable side effects. The cursor this way won't stay at the actual first character of the list item, but rather be at the bullet character - but hope that's ok.
Sorry if I misunderstood this issue - I don't use Braille and don't really understand the details of braille navigation.

Jul 21 '24 00:07 mltony

@mltony wrote:

2. When routing braille cursor we fall back to previous findable offset if current offset cannot be found.

This is what I propose in https://github.com/nvaccess/nvda/pull/16876. I like the idea about adding an extra parameter to moveToCodepointOffset to dictate its behavior when the offset couldn't be found, but I'm not comfortable enough with that function's code to tweak it that way.

Jul 23 '24 05:07 LeonarddeR