validate
validate copied to clipboard
data.table operations within validate
Hello,
I am trying to use data.table [i, j, by ] operations inside my validator. Based on this comment
https://github.com/data-cleaning/validate/issues/55#issuecomment-205220681 , in 2016 this was not supported. Is that still the case?
Here's an example of the kind of operation I'm trying to do:
- melt the data table
- then subset it with an
[i, j, by]statement - then run the check on the result
## MELT POC
library(data.table)
# example from https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reshape.html
s1 <- "family_id age_mother dob_child1 dob_child2 dob_child3
1 30 1998-11-26 2000-01-29 NA
2 27 1996-06-22 NA NA
3 26 2002-07-11 2004-04-05 2007-09-02
4 32 2004-10-10 2009-08-27 2012-07-21
5 29 2000-12-05 2005-02-28 NA"
DT <- fread(s1)
DT.m1 = melt(DT, id.vars = c("family_id", "age_mother"),
measure.vars = c("dob_child1", "dob_child2", "dob_child3"))
## Run the validation without validator package
melt(
DT,
id.vars = c("family_id", "age_mother"),
measure.vars = c("dob_child1", "dob_child2", "dob_child3")
)[
variable == "dob_child1",
][['age_mother']] > 30
# validator
library(validate)
working_validator <- validator(
melt(.,
id.vars = c("family_id", "age_mother"),
measure.vars = c("dob_child1", "dob_child2", "dob_child3")
)[['age_mother']] > 30
)
working_res <- confront(DT, working_validator)
summary(working_res)
non_working_validator <- validator(
melt(.,
id.vars = c("family_id", "age_mother"),
measure.vars = c("dob_child1", "dob_child2", "dob_child3")
)[
variable == "dob_child1",
][['age_mother']] > 30
)
non_working_res <- confront(DT, non_working_validator)
non_working_res$._error
I'm aware with this example, I could run the age_mother > 30 & variable =="dob_child1" on the original data or use subset, but I'd like to generally enable more complex data.table workflows.