generic cas-hashable contentHash result change if you move your type to another module / package
Describe the bug
The result of for contentHash depends on the name of module and package where the type is defined, see for example the instance for [a]:
https://github.com/tweag/funflow/blob/master/cas/hashable/src/Data/CAS/ContentHashable.hs#L385
It depends on contentHashUpdate_fingerprint which takes the module name and package name into account.
It leads to a change of hash if you move your type to another module / package.
This is super surprising.
To Reproduce
Create a type such as:
data MyType = MyType Int
deriving (ContentHashable m, Generic)
and call contentHash ([MyType 5])
Move it to different module or different package and see the different hash.
Expected behavior
Either it should be documented it should not depend on the package or module name.
Additional context
Note that running the compiled code or running the code from GHCi will also change the hash, because in GHCi, package name is 'main', when it is different in a compiled code.
At the same time, the code of https://github.com/tweag/funflow/blob/master/cas/hashable/src/Data/CAS/ContentHashable.hs#L466-L468 looks suspicious, it does the same hash 3 time on datatypeName
Actually, the same thing is happening if you change some base library.
For example contentHash (mempty :: Data.HashMap.Strict.HashMap () ()) returns a different result on two different GHC environment. I cannot really detail because I don't understand yet why.
The only thing I'm sure is that it is due to the typeRepFingerprint (typeOf (mempty :: HashMap () ())) which is different.
The branch https://github.com/guibou/funflow/tree/stable_unordered-containers-hash-cas-hashable contains two commits to fix this problem:
- One is fixing the
typeRepFingerprintand just replace it by(). This is discutable. - One is fixing the dubious code at https://github.com/tweag/funflow/blob/master/cas/hashable/src/Data/CAS/ContentHashable.hs#L466-L468 and removes the usage of the module/package names in the "fingerprint".
Both solutions are discutable. I'm available to discuss them, and open proper MR if we agree on the direction to use.
However, removing cas-hashable from our codebase took me less time than fixing this (I'm now using a one line function which compute the sha256 sum of the output of cereal serialization, which is enough for my needs and should be more robust and easier to maintain), so I won't pushing toward a fix here unless you ask / continue the discussion.