simhash icon indicating copy to clipboard operation
simhash copied to clipboard

some issues with code including python versions

Open shoaib-intro opened this issue 3 years ago • 1 comments

  1. doc needs to convert into string to proceed further for re doc = str(doc).lower() # convert into string to handle error if input is else

  2. while converting into binary from md5 encoding required to mention h = bin(int(md5(token.encode('utf-8')).hexdigest(), 16)) # to handle encoding error

  3. Python version update to call dictionary items in 3.0 for _, token in token_dict.items(): # instead of iteritems() in python version 2.0 replaced with items() in python 3.0

if __name__ == '__main__':
    # Just for demonstration
    doc = data # {doc_id, doc}
    binary_hash = simhash(doc)
    print(binary_hash)

shoaib-intro avatar Mar 28 '22 10:03 shoaib-intro

Hello @shoaib-intro. I am happy to review and merge a PR if you want to port the library to python 3.

memosstilvi avatar Aug 14 '22 11:08 memosstilvi