Wrong application of the Winkler penalty
https://github.com/feature23/StringSimilarity.NET/blob/b1b6399a2322496d113f47a22d623780f63675e2/src/F23.StringSimilarity/JaroWinkler.cs#L106
Problem 1
The Math.Min(JW_COEF, 1.0 / mtp[THREE]) is not necessary as it should just be JW_COEF - Jaro-Winkler does not specify replacing the scaling factor with 1/(length of longest string).
For example, if a string is 11 characters long, and JW_COEF = 0.1, then instead of 0.1, 1/11 is used.
Problem 2
The mtp[2] (prefix) is the Matches function is not capped at 4 - and it needs to be capped when being applied to final calculation.
Solution
I would like to suggest the below calculation as a solution:
jw = j + ((JW_COEF * Math.Min(mtp[2], 4)) * (1 - j))
I have cross referenced this with written papers and implementation in other languages such as R. (https://journal.r-project.org/archive/2014-1/loo.pdf)
Hi @Ehsan-F, thanks for submitting this!
This project is actually a port of an upstream project written in Java, so we do not want to deviate from how they are doing it. Would you mind filing a bug with the upstream project? Then, if they deem this to be correct and get it fixed, we'll incorporate the changes. I believe the line in question upstream is: https://github.com/tdebatty/java-string-similarity/blob/master/src/main/java/info/debatty/java/stringsimilarity/JaroWinkler.java#L87
Thanks!
Closing this issue until such a time as a fix is implemented upstream.