minhash
minhash copied to clipboard
This provides tools for b-bit MinHash algorism.
Hi, I've been learning to use this code recently. Can I think this code as the following three steps? First, we use lucene to generate text sets. Then we use...
The code ` String text = "新冠疫苗效果不错"; byte[] minhash = calculateMinHash(text); String text1 = "每天吃饭呀哈哈哈"; byte[] minhash1 = calculateMinHash(text1); float score1 = MinHash.compare(minhash, minhash1);` the result is "0.546875" you readme...
What goes in "..." for Tokenizer in the example you wrote in README.md? I would like to replicate the example but I am confused there. Thanks.