rtables icon indicating copy to clipboard operation
rtables copied to clipboard

`.path_to_pos()` fails when `path` contains regular expression symbols

Open rmao6 opened this issue 5 months ago • 3 comments

The code at https://github.com/insightsengineering/rtables/blob/main/R/tt_pos_and_access.R#L995 can't match paths that contain regular expression symbols like +. For example:

pathregex <- "gear + carb"
pathstrs <- c("gear + carb`3 + 1", "gear + carb`3 + 2")
grep(pathregex, pathstrs)

# integer(0)

Adding fixed = TRUE to grep() seems a solution, but it would affect the function of character *.

See also https://github.com/johnsonandjohnson/junco/issues/100

rmao6 avatar Nov 24 '25 15:11 rmao6

@gmbecker Can we just escape "+" inside escape_name_padding i.e.
ret <- gsub("+", "\\+", ret, fixed = TRUE)

Running the code snippet below (no dependency on junco) without the change gives me:

Image

But with the change gives me no errors.

library(rtables)
library(dplyr)

new_scorefun <- function(colpath) {
  paths <- NULL 
  function(tt) {
    paths <<- col_paths(tt)
    unlist(cell_values(tt, colpath = colpath), use.names = FALSE)[1]
  }  
}

data <- mtcars
data <- data %>%
  mutate(`gear + carb` = paste(gear, "+", carb))

data$cyl <- as.factor(data$cyl)
data$hp <- as.factor(data$hp)
data$`gear + carb` <- as.factor(data$`gear + carb`)

lyt <- rtables::basic_table(show_colcounts = TRUE) %>%
  rtables::split_cols_by("gear + carb") %>%
  rtables::add_overall_col("All Cars") %>%
  rtables::analyze("hp")
table <- rtables::build_table(lyt = lyt, df = data)

sorted_table <- rtables::sort_at_path(table,
  c("root", "hp"),
  scorefun = new_scorefun(colpath = c("gear + carb"))
)

nikolas-burkoff avatar Dec 04 '25 14:12 nikolas-burkoff

Hmm, escape_name_padding is intended to cover names that are modified in the sibling-name uniqification process. I don't recall if it is even called on names that don't require uniqification (and I'm not fully convinced it should be).

Also even if it is applied universally, accounting for "+" would just fix this specific issue, it would still break for any other symbol that is "regex-active" (other than [ and ], which it already escapes). And once people start creating syntactically invalid column names, who knows what they'll eventually want to put in there, so we'd need to escape every regex-active character to be completely safe, which I'm not super keen on doing.

gmbecker avatar Dec 04 '25 18:12 gmbecker

Thanks for the info and context 👍

nikolas-burkoff avatar Dec 05 '25 09:12 nikolas-burkoff