datacompy
datacompy copied to clipboard
Make duplicate handling better
You can get at duplicate rows like:
comp.df1_unq_rows[comp.df1_unq_rows['acct_id'].isin(comp.intersect_rows['acct_id'])]
The compare report just says Any duplicates on match values: Yes. Could have some things like
- Count of duplicates (i.e. that weren't matched)
- More information in docs about how duplicates are picked (explain the algorithm)
- Shortcut on the class to get at the duplicates (like just shortcut the above?) Or maybe some way to point a discarded duplicate at the corresponding record it could have matched with?