diff-match-patch icon indicating copy to clipboard operation
diff-match-patch copied to clipboard

bug in diff_cleanupMerge() in C#

Open jboyflaga opened this issue 7 years ago • 0 comments

I tried to do the https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs in C#.

This is the code that I used (copied from the existing private List<Diff> diff_lineMode() method):

public List<Diff> diff_lineMode(string text1, string text2)
{
	// Scan the text on a line-by-line basis first.
	var a = diff_linesToChars(text1, text2);
	var lineText1 = (string)a[0];
	var lineText2 = (string)a[1];
	var lineArray = (List<string>)a[2];
	var diffs = diff_main(lineText1, lineText2, false);

	// Convert the diff back to original text.
	diff_charsToLines(diffs, lineArray);

	// Eliminate freak matches (e.g. blank lines)
	diff_cleanupSemantic(diffs);

	return diffs;
}

But when I compared these texts...

text1 = "
country|description\r\n
CN|CHINA\r\n
PH|PHILIPPINES\r\n
JP|JAPAN\r\n
UK|UNITED KINGDOM\r\n
USA|U.S.A.\r\n
ZA|SOUTH AFRICA
"
text2 = "
country|description\r\n
CN|REPUBLIC OF CHINA\r\n
PH|REPUBLIC OF THE PHILIPPINES\r\n
JP|JAPAN\r\n
UK|U.K.\r\n
USA|UNITED STATES OF AMERICA\r\n
ZA|S. AFRICA
"

The result is this:

image

The result should be something like this (except for the JP|JAPAN part which should be EQUAL):

image

I traced the code and found that the error is happening after the diff_cleanupMerge() method is called inside the diff_cleanupSemantic() method, at the lines of code which looks like this:

//// Normalize the diff.
if (changes)
{
	diff_cleanupMerge(diffs);
}

Perhaps there is a bug in the diff_cleanupMerge() method when used to compare texts line by line.

Thanks for creating this tool, and thank you for the one who will fix the bug :)

jboyflaga avatar Jan 07 '19 05:01 jboyflaga