fix(web): enhanced timer for prediction algorithm
Hopefully addresses some aspects of #6985.
This PR is primarily designed to address some CI unit testing stability issues. Since our browser-based unit testing is handled via an external service, it's quite possible that our unit tests may be run alongside others in a manner that causes context-switching. Noting that we've had issues arise from context switching before - see #5467 / #5479 - it's quite possible that some of our prediction instability is arising from the same underlying problem - a user often does have background threads and processes running on their mobile device, after all.
Noting the similarity between this and intermittent failures for predictive text to produce no results when it often otherwise produces something with identical text, if my suspicions are correct, this may actually help mitigate some of the aspects of #6985.
The inner class ExecutionTimer here is an attempt to determine the predictive-text search's active execution time (that is, "actual time on the processor") by detecting possible context-switching time intervals and ignoring them. If the longest-interval observed at any point is at least twice as long as the next two longest-intervals observed, we'll call that an outlier. "Zero-length intervals" will also be ignored, since otherwise even a 1ms interval would automatically be an outlier.
- [1, 1, 4]: the '4' is considered an outlier
- [1, 2, 4]: the '4' isn't - 3*2 > 4, after all.
Note that excluded outliers aren't considered for any future outliers. The expectation/assumption is that each measured loop iteration should take a consistent amount of time... unless affected by context switching. To facilitate this, I've set things up for the timer (as used) to avoid tracking time in the search's yield state.
@keymanapp-test-bot skip
I think we should be fine without user testing... though I could see an argument for a mild bit of user testing for predictive text in the mobile apps if someone wanted to push for it.
User Test Results
Test specification and instructions
User tests are not required
Test Artifacts
- Android
- Developer
- iOS
- Keyboards
- Web
- Windows
I've re-run the lm-worker test multiple times now - the unit test that's addressed by these changes hasn't failed yet out of at least 5 attempts.
...not that this prevented BrowserStack from failing to run against one of the configured test browser configs for some of those attempts. 😞
This comment (https://github.com/keymanapp/keyman/pull/7036#issuecomment-1207594363) on the base branch also occurs here, including for the LMLayer's in-browser tests.
I think we should be fine without user testing... though I could see an argument for a mild bit of user testing for predictive text in the mobile apps if someone wanted to push for it.
It's a functional change internally. I'd like to see some user tests.
I've got no idea what sort of behaviour we are going to see on mobile devices for these scenarios. Sounds like some testing on low-end devices is seriously needed.
What did you have in mind for this? Do we have access to suitable devices for such tests? I assume that we'd need physical devices to test this; after all, this PR's changes are aimed at handling OS behavior that'd be very difficult to emulate directly.
Or... is that why it's been given approval - for a 'test in the wild' on alpha? Not the sort of thing we'd usually do, so unless explicitly stated otherwise, I'll avoid merging for now.
What did you have in mind for this? Do we have access to suitable devices for such tests? I assume that we'd need physical devices to test this; after all, this PR's changes are aimed at handling OS behavior that'd be very difficult to emulate directly.
Yeah, I am struggling with this. We do have a couple of old devices in the Keyman office there don't we?
Or... is that why it's been given approval - for a 'test in the wild' on alpha? Not the sort of thing we'd usually do, so unless explicitly stated otherwise, I'll avoid merging for now.
I'm not sure what else we can do here. So if possible, test on those old devices in the office there. And then go ahead and merge.
(Of course, the alternative is to avoid timing-based approaches altogether, and go with a 'number of iterations' approach where we determine a suitable number based on empirical testing of what would be a typically acceptable response time on some old devices. (And then as devices get faster over time, we can gradually raise the threshold.)
Changes in this pull request will be available for download in Keyman version 16.0.60-alpha