Optimize search algorithm + documentation of prediction.cppm
Added documentation as I try to understand the prediction implementation.
Please also check the items labeled as [Bug?] and see if they are actually bugs.
I'll try optimizing the search in another PR.
Thanks for the review Patrick.
A. It would only be noticeable if the user types a uncommon word straight away. Otherwise it should be equivalent.
B. I have no idea what's going on; it should not be so small. I'll take a look.
C. I haven't given the score much attention, thinking as long as it works, it's fine. Right now it's not constrained to 0-1 because that would be hard to make the cost non-decreasing during the A* search. I'll make changes so the math is more formal.
However, please note that instead of having a 0-1 score, we can simply say the score is the log confidence ranging from -infty to 0. Log probability is adopted in some libraries, including KenLM mentioned somewhere here. Subtracting a number is just multiplying the confidence with something smaller than 1. For example, one edit distance subtracts from the log confidence, which is equivalent to multiplying the confidence by 1/e. I'll probably do something along this line and return an exponent at the end.
D. I'm on Debian so I couldn't build nlptools. I'll ping you to get your setup.
it seems that merging in the latest main has messed up the branch and now changes are shown in the diff which aren't yours, making PR review pretty hard. Could you maybe undo the merge and properly rebase on main?
Hey Patrick,