Launcher icon indicating copy to clipboard operation
Launcher copied to clipboard

Files with equal hashes collide

Open octylFractal opened this issue 10 years ago • 9 comments

When there are two files with the same hash, they will both be treated as the same file. While hash collisions are rare, this may need to be handled for that case.

octylFractal avatar Mar 31 '15 20:03 octylFractal

This is a problem though I'm not sure how to elegantly fix it.

sk89q avatar Apr 04 '15 06:04 sk89q

I think something like objects/<same hash storage system>/<name> would fix it. That's close to what gradle does.

octylFractal avatar Apr 04 '15 06:04 octylFractal

You mean ab/cd/abcd.../name.ext?

sk89q avatar Apr 04 '15 06:04 sk89q

Yes.

octylFractal avatar Apr 04 '15 06:04 octylFractal

That's a lot of folders, though I guess it doesn't really matter.

Technically this could reasonably still result in a collision too...

sk89q avatar Apr 04 '15 06:04 sk89q

Yes, but I don't think that files that differ internally, but have the same hash and are called the same thing are very common. You could do two hashes:

hash1='abcd'
hash2='1f2f'
folderName=hash1[:2] + '/' + hash2[:2] + '/' + hash1 + '-' + hash2 + '/' + file.name

(I'm pretty sure two hashes at full length have about 0% collision, so you could do without the file name now.)

octylFractal avatar Apr 04 '15 06:04 octylFractal

I might just make it configurable. --hash "sha1[:2]/md5[:2]/sha1/md5" or something.

sk89q avatar Apr 04 '15 06:04 sk89q

I like that.

octylFractal avatar Apr 04 '15 06:04 octylFractal

Perhaps the builder could see two files with the same hash and double hash instead. As for files that get updated but still have the same hash, make it equal to hash(file hash + file date/time + file size)

dj3520 avatar Feb 19 '16 00:02 dj3520