datacompy icon indicating copy to clipboard operation
datacompy copied to clipboard

Make duplicate handling better

Open theianrobertson opened this issue 7 years ago • 0 comments

You can get at duplicate rows like:

comp.df1_unq_rows[comp.df1_unq_rows['acct_id'].isin(comp.intersect_rows['acct_id'])]

The compare report just says Any duplicates on match values: Yes. Could have some things like

  • Count of duplicates (i.e. that weren't matched)
  • More information in docs about how duplicates are picked (explain the algorithm)
  • Shortcut on the class to get at the duplicates (like just shortcut the above?) Or maybe some way to point a discarded duplicate at the corresponding record it could have matched with?

theianrobertson avatar Mar 29 '18 13:03 theianrobertson