lsh
lsh copied to clipboard
Some problems.
Could I import this module as a part of a course project.
In addition, when I try to test it, I find there seem to be some problems in this module and, so, I recode the _get_sig method as follow def _get_sig(self,shingle_vec,num_perms): """ recoded version of _get_sig """ sig = [self._sbucket_size]*num_perms keys = sorted(shingle_vec.keys()) for r in keys: #logging.debug('r=%d', r) h = np.array([hash((r,mask)) % self._sbucket_size for mask in self._memomask]) #logging.debug('h=%s',h) for i in range(num_perms): if (h[i] < sig[i]): sig[i] = h[i] #logging.debug('mhash=%s',sig) return sig
and I do not think naming a shingle by the increacing order instead of the random order is a good idea.