nlp icon indicating copy to clipboard operation
nlp copied to clipboard

Optimize search algorithm + documentation of prediction.cppm

Open moonbeamcelery opened this issue 2 years ago • 3 comments

Added documentation as I try to understand the prediction implementation. Please also check the items labeled as [Bug?] and see if they are actually bugs.

I'll try optimizing the search in another PR.

moonbeamcelery avatar Oct 19 '23 07:10 moonbeamcelery

Thanks for the review Patrick.

A. It would only be noticeable if the user types a uncommon word straight away. Otherwise it should be equivalent.

B. I have no idea what's going on; it should not be so small. I'll take a look.

C. I haven't given the score much attention, thinking as long as it works, it's fine. Right now it's not constrained to 0-1 because that would be hard to make the cost non-decreasing during the A* search. I'll make changes so the math is more formal.

However, please note that instead of having a 0-1 score, we can simply say the score is the log confidence ranging from -infty to 0. Log probability is adopted in some libraries, including KenLM mentioned somewhere here. Subtracting a number is just multiplying the confidence with something smaller than 1. For example, one edit distance subtracts from the log confidence, which is equivalent to multiplying the confidence by 1/e. I'll probably do something along this line and return an exponent at the end.

D. I'm on Debian so I couldn't build nlptools. I'll ping you to get your setup.

moonbeamcelery avatar Nov 07 '23 04:11 moonbeamcelery

it seems that merging in the latest main has messed up the branch and now changes are shown in the diff which aren't yours, making PR review pretty hard. Could you maybe undo the merge and properly rebase on main?

patrickgold avatar Dec 11 '23 16:12 patrickgold

Hey Patrick,

moonbeamcelery avatar Jan 29 '24 05:01 moonbeamcelery