fastLink icon indicating copy to clipboard operation
fastLink copied to clipboard

Exact match on certain column

Open shamahutoto opened this issue 4 years ago • 4 comments

Hi, is there a way to make sure that one column is an exact match?

shamahutoto avatar Sep 26 '21 23:09 shamahutoto

Yes, block on it.

aalexandersson avatar Sep 27 '21 12:09 aalexandersson

@shamahutoto Since there are various types of blocking, I should have been more precise:

Exact blocking on a variable (column), for example gender, makes sure that the variable is an exact match.

It is useful to think of record linkage as a process. You do blocking before the actual record linkage. Typically you use the blockData() function for the blocking. Please provide an example if you still need help. The main Github page for fastLink https://github.com/kosukeimai/fastLink gives an example.

Disclaimer: I am a regular user, not a developer.

aalexandersson avatar Oct 01 '21 01:10 aalexandersson

Hi @shamahutoto,

As @aalexandersson mentioned, you can either block on a certain variable. Note that for all the variables that you pass to fastLink that are not listed in stringdist.match or on numeric.match, exact matching is used to compare values.

Hope this helps! If anything, let us know.

All my best,

TEd

tedenamorado avatar Oct 09 '21 04:10 tedenamorado

Two years later, but just to be sure @tedenamorado, this means that if I don't add individuals' birth dates in either stringdist.match or numeric.match the algorithm will only try matching individuals (from the two dataframes) that have the same date of birth?

In that sense, it is the same thing as doing an exact block on the date of birth and then running the algorithm on the result? Or did I miss something?

itsmevictor avatar Jun 15 '23 12:06 itsmevictor