dedupe icon indicating copy to clipboard operation
dedupe copied to clipboard

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Results 106 dedupe issues
Sort by recently updated
recently updated
newest added

I have been using dedupe 2.0.6. Recently I ran into the KeyError [issue](https://github.com/dedupeio/dedupe/issues/986) with a dataset of 78,598 records. After I upgraded to version 2.017, the KeyError issue has been...

Hello Dedupe Team, Firstly, love the product! This is less of bug and more of question/how to. I need to enforce a hard rule. i.e when field1 == field1 then...

Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.7.0 to 2.8.1. Release notes Sourced from pypa/cibuildwheel's releases. v2.8.1 🐛 Fix a bug when building CPython 3.8 wheels on an Apple Silicon machine where testing would...

dependencies
github_actions

the project changed to an incompatible gpl license and it would be nice to have control over binary wheel releases. not actively developed so we would not be losing out...

I compute the pairwise scores for some data, and pass these scores to clustering. If my scores contain any 0s and if connected_components requires filtering, then we go into an...

See each commit individually, nothing functional changes EDIT: still shouldn't be any funcitonal changes, though this has gotten much more substantial

The major barrier to efficient parallelization of blocking is the inter process communication of the records. Similarly, much of the potential benefit of parallelization of scoring is lost because of...

From https://github.com/dedupeio/dedupe/issues/1045#issuecomment-1149052541 Why do we have the distinction between Static and non-Static classes? Is it to prevent re-training an already trained model? I don't think this needs to be enforced...

Currently, you choose whether or not to use index predicates by passing the `index_predicates` flag in `prepare_training()`. This has some drawbacks - Indexing happens regardless, in a previous step. Slow....