fuzzyjoin
fuzzyjoin copied to clipboard
Naming `distance_col` when matching along multiple variables
I'm experimenting with matching along n variables (ex x1 and x2) and want to keep track of the distance for each variable (distance_col = "distance"). You can do this, but the data frame creates n + 1 variables, a distance measure for each variable with the corresponding prefix (x1.distance) and an original distance measure distance that is only NA's. It would be nice if this were dropped automatically.
library(tidyverse)
library(fuzzyjoin)
ex_1 <- tibble(
x1 = c("how", "now", "brown", "cow"),
x2 = c("what", "do", "I", "know")
)
ex_2 <- tibble(
x1 = c("hw", "nw", "brwn", "cw"),
x2 = c("wht", "d", "I", "knw")
)
stringdist_inner_join(ex_1, ex_2, by = c("x1", "x2"),
method = "lv",
distance_col = "distance")