data.validator icon indicating copy to clipboard operation
data.validator copied to clipboard

[Bug]: validate_if with regex not working as expected

Open nick-youngblut opened this issue 2 years ago • 2 comments

Guidelines

  • [X] I agree to follow this project's Contributing Guidelines.

Project Version

0.1.2

Platform and OS Version

macOS 13.3

Existing Issues

No response

What happened?

My validation function:

#' validate whether the table column contains nucleotide strings
is_nucleotide = function(val, col_name){
  msg = glue::glue('"{x}" column is a nucleotide sequence', x={{col_name}})
  validate_if(val, grepl('^[ACGTURYKMSWBHDV]+$', {{col_name}}, perl=TRUE), 
              description = msg) 
}

The validation workflow:

report = data_validation_report()
read.delim(infile) %>%
  validate(name = "Verifying samples table") %>%
  is_nuc("TARGET_COLUMN") %>%
  add_results(report)

render_semantic_report_ui(get_results(report))

Example values in the TARGET_COLUMN of the data.frame:

"ATTCGTCC" "GCCTAATG" "GAGTCAAA" "AGACGTGG" "GACGGGAG" "AGTAAAGA"

If I use ^.+$, the validation passes, but the validation does not pass when using ^[A-Z]+$.

All of the string values in table column are just comprised of [ATGC]+, so I don't see why ^[A-Z]+$ and ^[ACGTURYKMSWBHDV]+$ are failing.

Steps to reproduce

See above

Expected behavior

See above

Attachments

No response

Screenshots or Videos

No response

Additional Information

No response

nick-youngblut avatar Sep 15 '23 16:09 nick-youngblut

I'm guessing that the issue is due to incorrect non-standard evaluation in:

validate_if(val, grepl('^[ACGTURYKMSWBHDV]+$', {{col_name}}, perl=TRUE), 
              description = msg)

...which is resulting in the column name to be evaluated instead of the column values.

I'm not sure how to fix my code. An example would be appreciated.

nick-youngblut avatar Sep 15 '23 16:09 nick-youngblut

ChatGPT4 eventually led me to a working function:

is_nuc = function(val, col_name){
  expr = bquote(grepl('^[ACGTURYKMSWBHDV]+$', .(val)[[.(col_name)]], perl=TRUE))
  validate_if(val, eval(expr)) 
}

nick-youngblut avatar Sep 15 '23 16:09 nick-youngblut