Recommendation for short hash characters
Like Git has 7, what is the recommendation here?
@alok87
Hope, this comment helps you with something.
I think there IS a way to shorten the hash but there is no "recommendation" for a short hash, like Git commit ID does, with this package.
Due to the below 3 reasons:
- Pigeonhole principle (the shorter it gets, the more it collides).
-
hashstructureuses non-cryptographic hash. - The package aims to detect object change but not to identify an object.
Since Git uses a cryptographic hash algorithm such as SHA-1, it has less probability to collide comparing to non-cryptographic hash algorithms, such as CRC, Checksum, FNV, and etc.
Cryptographic hash algorithms are slow but good for detecting falsification. So, it is useful to identify data.
On the other hand, non-cryptographic hash algorithms are fast and good to detect data change of the same object but very bad to detect falsification. So, it is not convenient to use them as identification values.
I believe this package does not aim to detect falsification of an object rather than simply to detect object change. And the shorter the hash it gets, the less confidence it gets. Thus there never be a "recommendation", I think.
If you want to use the hash value of hashstructure.Hash() function as an ID, like Git CID does, the best effort with your own risk would be:
- Encode the value of
hashstructureto Base64. - Use the first Nth character.
package main
import (
"encoding/base64"
"encoding/binary"
"fmt"
"github.com/mitchellh/hashstructure/v2"
)
func main() {
type ComplexStruct struct {
Foo string
Bar uint
Buzz map[string]interface{}
}
v := ComplexStruct{
Foo: "foo",
Bar: 64,
Buzz: map[string]interface{}{
"beep": true,
"sound": "bell",
},
}
hashRaw, err := hashstructure.Hash(v, hashstructure.FormatV2, nil)
if err != nil {
panic(err)
}
// Base16 (hex)
hashHex := fmt.Sprintf("%x", hashRaw)
// Base64
b := make([]byte, 8)
binary.LittleEndian.PutUint64(b, hashRaw)
hashBase64 := base64.StdEncoding.EncodeToString(b)
fmt.Println("Raw :", hashRaw, "(DEC, Base10)")
fmt.Println("Base16:", hashHex, "(HEX)")
fmt.Println("Base64:", hashBase64)
fmt.Println("Short :", hashHex[0:7], "(Base16)")
fmt.Println("Short :", hashBase64[0:7], "(Base64)")
}
// Output:
// Raw : 16126471403938159312 (DEC, Base10)
// Base16: dfccbc4cd83a9ad0 (HEX)
// Base64: 0Jo62Ey8zN8=
// Short : dfccbc4 (Base16)
// Short : 0Jo62Ey (Base64)
- https://play.golang.org/p/b4nvrum6oSc @ Go Playground
But thinking NOT about the backward compatibility of the HashOptions.Hasher method, and thinking about handling JSON data from an API or Etag like usage, I agree to have the cryptographic hash algorithms as an option though. SHA3 and Blake3 would be nice.