totally remove empty subgroup without post trimming/pruning

Open imazubi opened this issue 4 years ago • 0 comments

Hi @gmbecker , see the following scenario:

Create a dummy dataset

data <- data.frame(
  subj = paste0("subj", seq(1, 6)),
  paramcd = c(rep("a", 3), rep("b", 3)),
  abn = rep(c("low", "normal", "high"), 2),
  stringsAsFactors = TRUE)

I want to create a table (without using any trim_rows()/prune table()) with the number of subjects for both low and high direction (not for normal) containing the number of subjects with at least one parameter measure (either "low", "normal", "high").

map <- unique(
  data[data$abn != "normal", ]
) %>%
  lapply(as.character) %>%
  as.data.frame()


s_fun <- function(var = "subj",
                  .spl_context) {
  first_row <- .spl_context[1, ]
  subj <- first_row$full_parent_df[[1]][["subj"]]
  n <- length(unique(subj))
  n <- list(n = n)
}

MY QUERY Let's update first the dataframe so that param "a" will not have nay abnormalities

data2 <- data
data2$abn[data$paramcd == "a" & data$abn != "normal"] <- "normal"
data2

In case I would not like to show in the table the params without any abnormalities (paramcd "a"), how could I achieve this without trim_rows()/prune table() ?

In case I create an empirical map deleting the a records (which is not the best approach) we cannot obtain the correct table as for b we obtain incorrect "n"-s. Anyway it would not be the best approach as we are creating an ad-hoc map by deleting 0 abnormality params.

map2 <- map[map$paramcd != "a", ]
basic_table() %>%
  split_rows_by("paramcd", split_fun = trim_levels_to_map(map2)) %>%
  split_rows_by("abn") %>%
  analyze(vars = "subj", afun = make_afun(s_fun)) %>%
  build_table(df = data2)

The creation of map2 is a manual process,

Is it possible to do this by using trim_levels_in_group?
Suppose we have data as the following, can we prune the tree after the split?

   subj paramcd    abn
1 subj1       a normal
2 subj2       a normal
3 subj3       a normal
4 subj4       b    low
5 subj5       b normal
6 subj6       b   high

We would like to create the table as the following

     all obs
----------------
b
 low
   n       3
high
   n       3

Conceptually, we would like to split by rows, paramcd at the first level, and abn at the second level


    /\
   /  \
  a    b
  |   /|\

But since normal is not needed, this tree is effectively pruned as

   
   |
   b
  / \

I tried also with the following approach

data3 <- data2 %>% mutate(
  abn2 = factor(case_when(
    abn == "low" ~ "low",
    abn == "high" ~ "high",
    TRUE ~ ""
  ), 
  levels = c("low", "high")
  )
)

Not achieving a goal. I am obtaining an empty row for "a"

basic_table() %>% 
  split_rows_by("paramcd", split_fun = trim_levels_in_group("abn2", drop_outlevs = TRUE)) %>% 
  split_rows_by("abn2") %>% 
  analyze(vars = "subj", afun = make_afun(s_fun)) %>%
  build_table(df = data3)


    all obs
----------------
  a               
  b               
    low           
      n       3   
   high          
      n       3

Effectively, it's like the following,


  /\
 /  \
a    b
    / \

Here the point is that I cannot remove from the analysis dataset rows different from "low" or "high" as we need them for obtaining the correct values of "n"-s.

@shajoezhu

Nov 16 '21 14:11 imazubi