diffutils icon indicating copy to clipboard operation
diffutils copied to clipboard

Parallel diff and cmp on binary files?

Open pauschuu opened this issue 8 months ago • 1 comments

I just had a revelation:

$ time b3sum dreamshaper_8\ \(1\).safetensors dreamshaper_8.safetensors
771c807db56dbfc33feda5638d920f6c507db971da44772ee44a08dc38c3b437  dreamshaper_8 (1).safetensors
771c807db56dbfc33feda5638d920f6c507db971da44772ee44a08dc38c3b437  dreamshaper_8.safetensors

real    0m0.172s
user    0m2.193s
sys     0m0.423s


$ time cmp dreamshaper_8\ \(1\).safetensors dreamshaper_8.safetensors

real    0m0.596s
user    0m0.183s
sys     0m0.411s

$ time diff dreamshaper_8\ \(1\).safetensors dreamshaper_8.safetensors

real    0m0.509s
user    0m0.079s
sys     0m0.428s

As you can see, even though the b3sum method has an additional cost (calculating a hash) it is way faster overall since it's leveraging parallelism.

Wouldn't it be a good improvement to bring parallelism to some of the tools like diff and cmp? Maybe with a new (not-standardized) option? Maybe by default because why not?

I guess diff has a special code path once it is sure that it's just a binary file, right? So in that code path it wouldn't be much of a problem to parallelize it.

This whole topic can even be pushed further when comparing directories... parallel diffing of files.

Come on it's 2025! :)

pauschuu avatar May 31 '25 09:05 pauschuu