simhash
simhash copied to clipboard
OverflowError: Python integer 346 out of bounds for uint8
Getting OverflowError error for Python 3.12 on Ubuntu when trying to calculate Simash for a few domains, for example wikipedia.org
simhash==2.1.2
numpy==2.3.0
Traceback (most recent call last):
File "grouping.py", line 115, in fingerprint
Simhash(html)
^^^^^^^^^^^^^
File "env/lib/python3.12/site-packages/simhash/__init__.py", line 79, in __init__
self.build_by_text(unicode(value))
File "env/lib/python3.12/site-packages/simhash/__init__.py", line 107, in build_by_text
return self.build_by_features(features)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "env/lib/python3.12/site-packages/simhash/__init__.py", line 136, in build_by_features
sums.append(self._bitarray_from_bytes(h) * w)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
OverflowError: Python integer 609 out of bounds for uint8