feat(web): provide lexicon probabilities directly on the search path 📚

Open jahorton opened this issue 1 year ago • 1 comments

This PR was originally part of #10973.

In order to efficiently traverse a full lexical efficiently for dictionary-based wordbreaking, it's best to directly provide relevant probability data as efficiently as possible. Fortunately, it's easily possible to make this O(1) on the lexical model's internal iterator - the LexiconTraversal type. It would take O(log(N)) time to recompute it via the model's .predict method instead.

Note that this provides two different probability value types:

The probability of each reached entry.
The probability of the highest-frequency entry either represented by the current node or by any of its descendants.

There are uses for this outside of dictionary-based wordbreaking, too. The latter 'probability' listed above can be useful for optimizing the correction-search - if a path only produces low-frequency words, we should consider other paths that could yield higher-frequency words first.

There's also notable potential for being able to merge / blend two different models together via their LexiconTraversal iterators in this manner. Noting our upcoming push toward #11872, this would facilitate a fantastic way to achieve that goal - to create a stand-in model for the OS's dictionary and blend that with the loaded lexical-model via traversals.

@keymanapp-test-bot skip

Jun 25 '24 05:06 jahorton

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

Android
Developer
iOS
- Keyman for iOS (simulator image)
- FirstVoices Keyboards for iOS (simulator image)
- TestFlight internal PR build version - 18.0.60 (0.11868.11420)
Keyboards
- Test Keyboards
Web
- KeymanWeb Test Home
Windows

Jun 25 '24 05:06 keymanapp-test-bot[bot]

Changes in this pull request will be available for download in Keyman version 18.0.70-alpha

Jul 08 '24 18:07 keyman-server