minhash
minhash copied to clipboard
How the code works?
Hi, I've been learning to use this code recently. Can I think this code as the following three steps? First, we use lucene to generate text sets. Then we use a family of hash functions to obtain the minhash values. Finally we reduce the minhash value length to b-bit. That why we finally got a num* hashbit bits minhash. Am I right? I would appreciate it if you could reply this question.