diff-match-patch
diff-match-patch copied to clipboard
bug in diff_cleanupMerge() in C#
I tried to do the https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs in C#.
This is the code that I used (copied from the existing private List<Diff> diff_lineMode() method):
public List<Diff> diff_lineMode(string text1, string text2)
{
// Scan the text on a line-by-line basis first.
var a = diff_linesToChars(text1, text2);
var lineText1 = (string)a[0];
var lineText2 = (string)a[1];
var lineArray = (List<string>)a[2];
var diffs = diff_main(lineText1, lineText2, false);
// Convert the diff back to original text.
diff_charsToLines(diffs, lineArray);
// Eliminate freak matches (e.g. blank lines)
diff_cleanupSemantic(diffs);
return diffs;
}
But when I compared these texts...
text1 = "
country|description\r\n
CN|CHINA\r\n
PH|PHILIPPINES\r\n
JP|JAPAN\r\n
UK|UNITED KINGDOM\r\n
USA|U.S.A.\r\n
ZA|SOUTH AFRICA
"
text2 = "
country|description\r\n
CN|REPUBLIC OF CHINA\r\n
PH|REPUBLIC OF THE PHILIPPINES\r\n
JP|JAPAN\r\n
UK|U.K.\r\n
USA|UNITED STATES OF AMERICA\r\n
ZA|S. AFRICA
"
The result is this:

The result should be something like this (except for the JP|JAPAN part which should be EQUAL):

I traced the code and found that the error is happening after the diff_cleanupMerge() method is called inside the diff_cleanupSemantic() method, at the lines of code which looks like this:
//// Normalize the diff.
if (changes)
{
diff_cleanupMerge(diffs);
}
Perhaps there is a bug in the diff_cleanupMerge() method when used to compare texts line by line.
Thanks for creating this tool, and thank you for the one who will fix the bug :)