patchutils icon indicating copy to clipboard operation
patchutils copied to clipboard

Implement an advanced fuzzy diffing feature for interdiff

Open kerneltoast opened this issue 2 months ago • 4 comments

Description

This implements a --fuzzy option to make interdiff perform a fuzzy comparison between two diffs. This is very helpful, for example, for comparing a backport patch to its upstream source patch to assist a human reviewer in verifying the correctness of the backport.

The fuzzy diffing process is complex and works by:

  • Generating a new patch file with hunks split up into smaller hunks to separate out multiple deltas (+/- lines) in a single hunk that are spaced apart by context lines, increasing the amount of deltas that can be applied successfully with fuzz
  • Applying the rewritten p1 patch to p2's original file, and the rewritten p2 patch to p1's original file; the original files aren't ever merged
  • Relocating patched hunks in only p1's original file to align with their respective locations in the other file, based on the reported line offset printed out by patch for each hunk it successfully applied
  • Squashing unline gaps fewer than max_context*2 lines between hunks in the patched files, to hide unknown contextual information that is irrelevant for comparing the two diffs while also improving hunk alignment between the two patched files
  • Diffing the two patched files as usual
  • Rewriting the hunks in the diff output to exclude unlines from the unified diff, even splitting up hunks to remove unlines present in the middle of a hunk, while also adjusting the @@ line to compensate for the change in line offsets
  • Emitting the rewritten diff output while interleaving rejected hunks from both p1 and p2 in the output in order by line number, with a comment on the @@ line indicating when an emitted hunk is a rejected hunk

This also involves working around some bugs in patch itself encountered along the way, such as occasionally inaccurate line offsets printed out and spurious fuzzing in certain cases that involve hunks with an unequal number of pre-context and post-context lines.

The end result of all of this is a minimal set of real differences in the context lines of each hunk between the user's provided diffs. Even when fuzzing results in a faulty patch, the context differences are shown so there is never a risk of any real deltas getting hidden due to fuzzing.

By default, the fuzz factor used is just the default used in patch. The fuzz factor can be adjusted by the user via appending =N to --fuzzy to specify the maximum number of context lines for patch to fuzz.

Testing

This was tested on several complex Linux kernel patches to compare the backported version of a patch to its original upstream version. This PR also comes with a few basic fuzzy diffing tests integrated into the test infrastructure.

kerneltoast avatar Nov 14 '25 06:11 kerneltoast

Codecov Report

:x: Patch coverage is 93.77358% with 33 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 86.76%. Comparing base (487f3e8) to head (60a60b3).

Files with missing lines Patch % Lines
src/interdiff.c 93.77% 33 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #156      +/-   ##
==========================================
+ Coverage   86.47%   86.76%   +0.29%     
==========================================
  Files          15       15              
  Lines        8176     8567     +391     
  Branches     1643     1755     +112     
==========================================
+ Hits         7070     7433     +363     
- Misses       1106     1134      +28     
Flag Coverage Δ
unittests 86.76% <93.77%> (+0.29%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 14 '25 07:11 codecov[bot]

@twaugh Please take a look at this when you can, thanks!

kerneltoast avatar Nov 14 '25 08:11 kerneltoast

All checks are passing now with a lot more test coverage added.

kerneltoast avatar Nov 14 '25 19:11 kerneltoast

@twaugh I had updated this PR with several fixes and additional tests last week, and everything is finalized beyond a shadow of a doubt at this point. Would love to land this into patchutils!

kerneltoast avatar Nov 24 '25 19:11 kerneltoast