[Bug]: validate_if with regex not working as expected
Guidelines
- [X] I agree to follow this project's Contributing Guidelines.
Project Version
0.1.2
Platform and OS Version
macOS 13.3
Existing Issues
No response
What happened?
My validation function:
#' validate whether the table column contains nucleotide strings
is_nucleotide = function(val, col_name){
msg = glue::glue('"{x}" column is a nucleotide sequence', x={{col_name}})
validate_if(val, grepl('^[ACGTURYKMSWBHDV]+$', {{col_name}}, perl=TRUE),
description = msg)
}
The validation workflow:
report = data_validation_report()
read.delim(infile) %>%
validate(name = "Verifying samples table") %>%
is_nuc("TARGET_COLUMN") %>%
add_results(report)
render_semantic_report_ui(get_results(report))
Example values in the TARGET_COLUMN of the data.frame:
"ATTCGTCC" "GCCTAATG" "GAGTCAAA" "AGACGTGG" "GACGGGAG" "AGTAAAGA"
If I use ^.+$, the validation passes, but the validation does not pass when using ^[A-Z]+$.
All of the string values in table column are just comprised of [ATGC]+, so I don't see why ^[A-Z]+$ and ^[ACGTURYKMSWBHDV]+$ are failing.
Steps to reproduce
See above
Expected behavior
See above
Attachments
No response
Screenshots or Videos
No response
Additional Information
No response
I'm guessing that the issue is due to incorrect non-standard evaluation in:
validate_if(val, grepl('^[ACGTURYKMSWBHDV]+$', {{col_name}}, perl=TRUE),
description = msg)
...which is resulting in the column name to be evaluated instead of the column values.
I'm not sure how to fix my code. An example would be appreciated.
ChatGPT4 eventually led me to a working function:
is_nuc = function(val, col_name){
expr = bquote(grepl('^[ACGTURYKMSWBHDV]+$', .(val)[[.(col_name)]], perl=TRUE))
validate_if(val, eval(expr))
}