simhash
simhash copied to clipboard
A Python Implementation of Simhash Algorithm
Add the possibility to concatenate simhashes to make a larger one. This way one can make a sort of "signature" where multiple simhashes are combined into one. Also made the...
gmpy2 2.1.2 has released and prove wheels for multiple platform.
#44 introduces gmpy2 dependency, which causes major issues downstream. Error: `src/gmpy.h:252:20: fatal error: mpfr.h: No such file or directory` gmpy2 doesn't seem to be pip installable, at least not on...
I custom it by like below: ` ans = PriorityQueue() for key in self.get_keys(simhash): dups = self.bucket[key] self.log.debug('key:%s', key) if len(dups) > 200: self.log.warning('Big bucket found. key:%s, len:%s', key, len(dups))...
Getting `OverflowError` error for `Python 3.12` on `Ubuntu` when trying to calculate Simash for a few domains, for example `wikipedia.org` ``` simhash==2.1.2 numpy==2.3.0 ``` ``` Traceback (most recent call last):...