Rupesh Kumar
Rupesh Kumar
Hi @ekzhu, I would like to work on this. I have already built something similar for my use-case where I have to deduplicate a huge corpus of almost 100M documents....
I am planning to work on this project in my free time. So, a few questions 1. Do we need to add it as a class method attached to the...
Hi @ekzhu. Let me understand this when @hsicsa suggesting to merge identical MinHashLSH objects, it is combining all the keys from similar objects into one object and if the same...
I have taken a look at it, I wanted to understand what initialization parameters are we going to compare. I have implemented the __eq__ and compared the type, threshold, and...
Okay so let me understand. There are actually two keys. One that is given by the user while inserting MinHash into the MinHashLSH object along with the MinHash object and...
@ekzhu I have taken a look at the issue and it seems that Datastax already raised this. Sharing the ticket [cassandra-driver for Python 3.12 Linux is compiled without libev support](https://datastax-oss.atlassian.net/jira/software/c/projects/PYTHON/issues/?filter=allopenissues)
We might have to use asyncio since asyncore has been deprecated. But even if we implement it I think it will be a short time thing because once the issue...
@ekzhu I have raised a PR. I have tried to cover all the points that we had discussed. Please feel free to engage and let me know if any modification...
@ekzhu Resolved the conversations. Can you please go through the changes. Thank you. Let me know if there are any other changes.
> Could you also add "3.12" to the tested python version: > > https://github.com/ekzhu/datasketch/blob/master/.github/workflows/test.yml#L11 This is also done.