[Feature]: validate column name existence
Guidelines
- [X] I agree to follow this project's Contributing Guidelines.
Description
A modified version of the README example:
library(magrittr)
library(data.validator)
report <- data_validation_report()
validate(mtcars, name = "Verifying cars dataset") %>%
validate_if(drat > 0, description = "Column drat has only positive values") %>%
validate_cols(in_set(c(0, 2)), WRONG_COLUMN_NAME, vs, am, description = "vs and am values equal 0 or 2 only") %>%
validate_cols(within_n_sds(1), mpg, description = "mpg within 1 sds") %>%
validate_rows(num_row_NAs, within_bounds(0, 2), vs, am, mpg, description = "not too many NAs in rows") %>%
validate_rows(maha_dist, within_n_mads(10), everything(), description = "maha dist within 10 mads") %>%
add_results(report)
print(report)
The error:
> validate(mtcars, name = "Verifying cars dataset") %>%
+ validate_if(drat > 0, description = "Column drat has only positive values") %>%
+ validate_cols(in_set(c(0, 2)), WRONG_COLUMN_NAME, vs, am, description = "vs and am values equal 0 or 2 only") %>%
+ validate_cols(within_n_sds(1), mpg, description = "mpg within 1 sds") %>%
+ validate_rows(num_row_NAs, within_bounds(0, 2), vs, am, mpg, description = "not too many NAs in rows") %>%
+ validate_rows(maha_dist, within_n_mads(10), everything(), description = "maha dist within 10 mads") %>%
+ add_results(report)
Error in `dplyr::select()` at assertr/R/assertions.R:102:2:
! Can't subset columns that don't exist.
✖ Column `WRONG_COLUMN_NAME` doesn't exist.
As far as I can tell, if the user provides a table in which a validated column doesn't exist, then the validate workflow throws an error instead of producing a report stating validation failed due to missing required columns.
Problem
No checks that the validated columns exist in the provided data.frame. So, the column-exists check must be placed outside of the generate-validation-report workflow. The feedback to the user is then split into at least 2 validations: 1) a check for the required columns and 2) the validation report -- instead of just one all-encompassing validation report.
Proposed Solution
Include assertr::has_all_names in the validation report, or if that is already possible, provide an example in the package README.
Alternatives Considered
I'm currently validating the existence of the required columns prior to using data.validator, and providing user feedback on the column existence via shiny::showNotification()