fuzzyjoin icon indicating copy to clipboard operation
fuzzyjoin copied to clipboard

Confusion regarding by vs. multi_by and match_fun vs. multi_match_fun

Open ahcyip opened this issue 8 years ago • 0 comments

In addition to providing examples of match_fun's #22 , it looks like match_fun gets used as multi_match_fun if match_fun is singular and there are multiple column's in the by argument? "If only one function is given it is used on all column pairs. "

If so, then multi_by and multi_match_fun seems confusing and redundant to me.

I see the note "Note that as of now, you cannot give both match_fun and multi_match_fun- you can either compare each column individually or compare all of them." Perhaps multi_by and multi_match_fun should be removed in the future?

Basically, the following definitions seem redundant and I can't tell what the differences are

by	
Columns of each to join

match_fun	
Vectorized function given two columns, returning TRUE or FALSE as to whether they are a match. Can be a list of functions one for each pair of columns specified in by (if a named list, it uses the names in x). If only one function is given it is used on all column pairs.

multi_by	
Columns to join, where all columns will be used to test matches together

multi_match_fun	
Function to use for testing matches, performed on all columns in each data frame simultaneously

ahcyip avatar Jun 06 '17 03:06 ahcyip