feat(windows): handle CRLF specific to windows
Fixes: #10471
Added a function to replace \r\n with \n that is U+000D U+000A with U+000A. And another to do the reverse, that is insert '\rin front of any\n` characters to support the CRLF for Windows. Supporting unit tests were also added.
This PR took a change after my first commit. Initially, I had nice symmetrical pre_process_context post_process_context pair. Then I realised I needed to deal with the UTF-32 encoding. It was more efficient just to process that and insert the CR at the same time, otherwise we a just adding another parse and making an extra buffer we don't need. I have written comments in the code that when if we remove the action queue then we can then make it more
User Testing
-
TEST_HIEROGLYPHIC_WINDOWS
- Install Hieroglyphic Keyboard from keyman.com into Keyman for Windows.
- Open Wordpad.
- Select the Hieroglyphic keyboard
- In wordpad, type: wdispacebarspacebar
- Expected Output : w𓂞
- Type:backspace
- Expected Output: w
-
TEST_NEWLINE_LF_ONLY Install the PR build of Keyman
- Install the attached newline keyboard newline.zip
- Open Wordpad.
- In wordpad, type: aEnterc
- Expected Output :
Row lf - Open Word
- In wordpad, type: aEnterc
- Expected Output:
Row lf
User Test Results
Test specification and instructions
- ✅ TEST_HIEROGLYPHIC_WINDOWS (PASSED): Tested with the attached PR build (17.0.266-alpha-test-10697) on Windows 11 OS and here is my observation: 1. Opened Wordpad app. 2. Typed 'wdi spacebar spacebar' and verified that it showed the expected output.3. Pressed backspace key and showed the expected output. Seems to be working fine. (notes)
- ✅ TEST_NEWLINE_LF_ONLY (PASSED): Retested as per Ross's suggestion in a WordPad document and here is my observation: 1. Followed the steps as mentioned in the test description. 2. Verified that 'Row If' text appears on the document after pressing the Enter key. (notes)
- ✅ TEST_NEWLINE_CR_ONLY (PASSED): Tested with the attached PR build (keyman version 17.0.285-beta-test-10697) and here is my observation: 1. Opened the 'editpad' website. 2. Created a new document. 3. Followed step (iv) and verified that the text 'ROW If' appeared in the document. Seems to be working as expected. (notes)
Test Artifacts
Test Results
- TEST_HIEROGLYPHIC_WINDOWS (PASSED): Tested with the attached PR build (17.0.266-alpha-test-10697) on Windows 11 OS and here is my observation: 1. Opened Wordpad app. 2. Typed 'wdi spacebar spacebar' and verified that it showed the expected output.3. Pressed backspace key and showed the expected output. Seems to be working fine.
..Result 1
..Result 2
The user tests do not appear to test anything relating to CRLF. Is that something we can do?
The user tests do not appear to test anything relating to CRLF. Is that something we can do?
Correct the current test was good regression to make sure nothing was broken. I was working on using a ldml keyboard with rules for the CRLF test. However, it seemed to fail but the debugger and logging was giving some "interesting" results, and proving a bit tricky to test, as it appears something else can strip the CR. Still investigating.
@mcdurdin I have just been doing some logging around the contexts and finding some surprising results.
- AITIP::ReadContext is returning the string with just
\rand no `\n' . I have followed the call stack to see if this what it gets back from the TSF API call. - Not using AITIP::ReadContext. With the LDML Then reading the cores version of the app context it is not keeping track of new lines at all but include the characters before the enter was pressed.
- A kmn keyboard without ReadContext the context is cleared when the Enter Key is pressed.
Note to self investigate:
GetLeftOfSelectioninkmtip\inserttext.cpp
Using Keyman Developer I was able to create a keyboard with \r\n rules and when testing the the keyboard in Developer in matched those rules. However developer is clearly not the same as Windows application.
And with that I guess this has to move back to draft
Which compliant apps are you testing with? I am guessing some apps will treat each paragraph as a separate context, but others may inline the \r\n.
New thought: ideally, we track what the app gives us in the input context, and give the same pattern back. So we have a function (serious pseudocode ahead warning):
const LBType = { none, '\n', '\r\n', '\r' };
function normalizeLineBreaks(&str): LBType {
// find first line break
m = /(\r\n|\r|\n)/.match(str);
if(!m) {
return none;
}
str = str.replaceAll(/(\r\n|\r|\n)/g, '\n');
return m.firstMatch;
}
function restoreLineBreaks(str, tp:LBType, defaulttp:LBType): string {
if(tp == none) tp = defaulttp; // use platform default for line breaks if context had no line breaks
if(tp == none) return str; // if platform has no default and context had no line breaks, do nothing
return str.replaceAll(/\n/g, tp); // replace line breaks
}
And, this could even be buried in Core, ideally, so that the Engine doesn't need to do this legwork. We could pass defaultLineBreakType in as a parameter to Core...
Seems like too much for 17.0 though!
And, this could even be buried in Core, ideally, so that the Engine doesn't need to do this legwork. We could pass
defaultLineBreakTypein as a parameter to Core...Seems like too much for 17.0 though!
This is putting a more of platform knowledge into the core, but it is ok as it we keep it away from the core logic of the processors. I assume when we use the core context to load the updated app context (stripping out the markers) we also run restoreLineBreaks
This would be more robust to handling any nuances from different platforms and apps. for example would also handle any MacOS apps that also used just \r I would still like to know why the Windows ReadContext has only `\r'.
The platforms would need to make sure to set the platform default line break when loading the ~~keyboard~~ state. This would be straightforward with the core environment array already part of km_core_state_create.
Let's see if things are breaking with CRLF in visible ways in Windows. If not, we can push this whole project to 18.0 and do it properly there.
I have implemented the normalize/ restore algorithm but not in core at this late stage as it would effect all platforms. I have created a issue #10954 to move it to core in version 18.0
Test Results
- TEST_NEWLINE_LF_ONLY (FAILED): Tested with the attached PR build (Keyman ) on Windows 10 OS and here is my observation: 1. Installed NewLine keyboard. 2. Opened WordPad document. 3. Switched to NewLine keyboard. 4. Typed as per the instructions. Verified it is showing the expected result. 5. Opened Microsoft office 365 Word online document. 6. Switched to NewLine Keyboard. 7. Typed as per instructions. Noticed that it is showing only the typed letters. Seems to be an issue.
..in WordPad document
..in Microsoft Office 365 word online document
@bharanidharanj The testing of Word needs to be for an installed desktop version. It can be 365 but still needs to be an installed version. It failed the test because you used the online version in a web browser that doesn't support the Text Services Framework(TSF). Which is needed for this test. Do you have a test machine with Word installed? If not let me know and I will modify the test to suit.
@keymanapp-test-bot retest TEST_NEWLINE_LF_ONLY
@bharanidharanj The testing of Word needs to be for an installed desktop version. It can be 365 but still needs to be an installed version. It failed the test because you used the online version in a web browser that doesn't support the Text Services Framework(TSF). Which is needed for this test. Do you have a test machine with Word installed? If not let me know and I will modify the test to suit.
@keymanapp-test-bot retest TEST_NEWLINE_LF_ONLY
@rc-swag I think it would be better to modify the test to suit. Thanks.
@rc-swag I think it would be better to modify the test to suit. Thanks.
@bharanidharanj I have updated the TEST_NEWLINE_LF_ONLY test to only have Wordpad and added another for TEST_NEWLINE_CR_ONLY with Firefox so please do this tests
@rc-swag I think it would be better to modify the test to suit. Thanks.
@bharanidharanj I have updated the
TEST_NEWLINE_LF_ONLYtest to only have Wordpad and added another forTEST_NEWLINE_CR_ONLYwith Firefox so please do this tests
@rc-swag Okay, Thanks. I will do it.
Test Results
-
TEST_NEWLINE_LF_ONLY (PASSED): Retested as per Ross's suggestion in a WordPad document and here is my observation: 1. Followed the steps as mentioned in the test description. 2. Verified that 'Row If' text appears on the document after pressing the Enter key.
-
TEST_NEWLINE_CR_ONLY (PASSED): Tested with the attached PR build (keyman version 17.0.285-beta-test-10697) and here is my observation: 1. Opened the 'editpad' website. 2. Created a new document. 3. Followed step (iv) and verified that the text 'ROW If' appeared in the document. Seems to be working as expected.
I am somewhat uncomfortable about this PR because it makes significant string handling changes very late in the beta cycle.
Based on this I am going to close this and update the issue #10471 for 18.0 but for the algorithm to be in the core (and to check developer debugger). That is where I ultimately wanted it to live and it would then cover linux and macos also. This was just to get a reference solution over the line for the LDML and version 17. With the buffer in the core truncation I would have done it differently however when I looked at it was just kicking the problem down the road because it had truncation in there anyway around a defined MAX_CONTEXT. Anyway, for the core I can revisit it as it will sit on a different layer.
Finally, I am disappointed I missed the surrogates as that is one of the first edge cases I discovered in the Context::Buf method. when I started.
This will be moved to version 18.0 as it is too late in the beta cycle. The logic will also be contained in the core.
Agree that this can wait until 18.0; if necessary we can back-port to 17.0 stable.