Octodiff icon indicating copy to clipboard operation
Octodiff copied to clipboard

Adler32RollingChecksumV2 seems to give bad results

Open Spavid04 opened this issue 3 years ago • 0 comments

Description

When using the V2 rolling checksum algorithm, files that are identical or very slightly different result in huge deltas: the whole new file gets added as the delta.

Environment

  • repo freshly cloned from the current master branch (commit d87ee313dfd8e48fe96dc66a6b08af12751e06c1)
  • VS2022 17.2.6 on Windows 10 x64

I had to make a small code change so the command line app would use the V2 algorithm by default:

diff --git a/source/Octodiff/Core/SupportedAlgorithms.cs b/source/Octodiff/Core/SupportedAlgorithms.cs
index 2cc2aa5..5552f13 100644
--- a/source/Octodiff/Core/SupportedAlgorithms.cs
+++ b/source/Octodiff/Core/SupportedAlgorithms.cs
@@ -52,7 +52,7 @@ namespace Octodiff.Core
 
         public virtual IRollingChecksum Default()
         {
-            return Adler32Rolling();
+            return Adler32Rolling(true);
         }
 
         public virtual IRollingChecksum Create(string algorithm)

Steps to reproduce

  • grab a random binary file; my test was kernel32.dll from windows\system32
  • create 2 copies of it: copy1.dll and copy2.dll
  • modify copy2.dll very slightly; I simply changed the first byte from 'M' to 'A'
  • run octodiff to create the deltas:
    • Octodiff.exe signature kernel32.dll signature.bin
    • Octodiff.exe delta signature.bin copy1.dll delta1.bin
    • Octodiff.exe delta signature.bin copy2.dll delta2.bin
  • observe how the delta files are very "not delta-y"

Other notes

The V1 version of the algorithm does produce expectedly small delta files.

Spavid04 avatar Jul 21 '22 23:07 Spavid04