languages icon indicating copy to clipboard operation
languages copied to clipboard

Levenshtein: Improve the correctness check

Open PEZ opened this issue 1 year ago • 5 comments

I'm reminded by this issue:

  • #328

That we need to do better correctness checks of the output of the levenshtein programs. We probably need to update the programs to output more of what they have found out. My first idea (which may be crap) is that in addition to the number of checks it does and the minimum distance, we should also check maximum distance.

PEZ avatar Jan 09 '25 21:01 PEZ

Let benchmark fail if maximum distance is incorrect? This would make the diagnostic easier. Printing arbitrary results does not seem to be a reliable monitoring measure.

Update: This would require input format change.

basilevs avatar Jan 12 '25 16:01 basilevs

We do the current check on the output, so we can do that for max distance too.

PEZ avatar Jan 12 '25 19:01 PEZ

By passing in the input data as command line arguments, aren't you just adding how efficient that particular language is at parsing command line arguments?

I don't believe that parsing hundreds of command line arguments is a common thing to do, or if the language creators specifically optimise for it so you're potentially adding that overhead when perhaps just passing in a file and opening the file (which is a common thing to do) would remove the command line parser from the equation.

dre2004 avatar Jan 14 '25 12:01 dre2004

I don't know how adding file reading and parsing would be better than having the strings passed from the command line?

That said, I agree that it would be good if we could isolate the actual benchmark better. There's a suggestion to that end here:

  • https://github.com/bddicken/languages/issues/341

PEZ avatar Jan 14 '25 12:01 PEZ

For correctness check in this case, a sum of the calculated distances sounds like it could catch many potential errors.

Janmm14 avatar Jan 15 '25 20:01 Janmm14