funflow icon indicating copy to clipboard operation
funflow copied to clipboard

generic cas-hashable contentHash result change if you move your type to another module / package

Open guibou opened this issue 4 years ago • 3 comments

Describe the bug

The result of for contentHash depends on the name of module and package where the type is defined, see for example the instance for [a]:

https://github.com/tweag/funflow/blob/master/cas/hashable/src/Data/CAS/ContentHashable.hs#L385

It depends on contentHashUpdate_fingerprint which takes the module name and package name into account.

It leads to a change of hash if you move your type to another module / package.

This is super surprising.

To Reproduce

Create a type such as:

data MyType = MyType Int
   deriving (ContentHashable m, Generic)

and call contentHash ([MyType 5])

Move it to different module or different package and see the different hash.

Expected behavior

Either it should be documented it should not depend on the package or module name.

Additional context

Note that running the compiled code or running the code from GHCi will also change the hash, because in GHCi, package name is 'main', when it is different in a compiled code.

guibou avatar Dec 30 '21 21:12 guibou

At the same time, the code of https://github.com/tweag/funflow/blob/master/cas/hashable/src/Data/CAS/ContentHashable.hs#L466-L468 looks suspicious, it does the same hash 3 time on datatypeName

guibou avatar Dec 30 '21 22:12 guibou

Actually, the same thing is happening if you change some base library.

For example contentHash (mempty :: Data.HashMap.Strict.HashMap () ()) returns a different result on two different GHC environment. I cannot really detail because I don't understand yet why.

The only thing I'm sure is that it is due to the typeRepFingerprint (typeOf (mempty :: HashMap () ())) which is different.

guibou avatar Sep 01 '23 08:09 guibou

The branch https://github.com/guibou/funflow/tree/stable_unordered-containers-hash-cas-hashable contains two commits to fix this problem:

  • One is fixing the typeRepFingerprint and just replace it by (). This is discutable.
  • One is fixing the dubious code at https://github.com/tweag/funflow/blob/master/cas/hashable/src/Data/CAS/ContentHashable.hs#L466-L468 and removes the usage of the module/package names in the "fingerprint".

Both solutions are discutable. I'm available to discuss them, and open proper MR if we agree on the direction to use.

However, removing cas-hashable from our codebase took me less time than fixing this (I'm now using a one line function which compute the sha256 sum of the output of cereal serialization, which is enough for my needs and should be more robust and easier to maintain), so I won't pushing toward a fix here unless you ask / continue the discussion.

guibou avatar Sep 01 '23 10:09 guibou