rtables Can sorting be done to tables constructed via rbind?

Hi @gmbecker , I'm trying to create a table in two parts and then sort the result which doesn't seem to work. See example below.

The reason for splitting the table in two is because the layout must be able to display content rows with 0s which doesn't work because analyze must always return some content. As an alternative I would use dummy levels and then prune them away but was curious to hear your thoughts on whether the rbind approach is even feasible.

library(rtables)

df1 <- data.frame(
  class = factor(c("D", "A", "A")),
  term = factor(c("dd", "aa", "aa")),
  id = c(1, 2, 3)
)

t1 <- basic_table() %>%
  split_rows_by("class", child_labels = "visible", split_fun = trim_levels_in_group("term")) %>%
  summarize_row_groups("id", label_fstr = "(n)") %>%
  analyze("term") %>%
  build_table(df1)

df2 <- data.frame(
  class = factor(levels = c("B")),
  id = numeric()
)

t2 <- basic_table() %>%
  split_rows_by("class", child_labels = "visible") %>%
  summarize_row_groups("id", label_fstr = "(n)") %>%
  build_table(df2)

col_info(t2) <- col_info(t1)
t3 <- rbind(t2, t1)

> t3
          all obs 
------------------
B                 
  (n)    0 (NaN%) 
A                 
  (n)    2 (66.7%)
    aa       2    
D                 
  (n)    1 (33.3%)
    dd       1    
> sort_at_path(t3, path = c("rbind_rnoot", "*", "class"), scorefun = cont_n_allcols, decreasing = T) 
          all obs 
------------------
B                 
  (n)    0 (NaN%) 
A                 
  (n)    2 (66.7%)
    aa       2    
D                 
  (n)    1 (33.3%)
    dd       1

I want to see a result that looks like this:

          all obs 
------------------
A                 
  (n)    2 (66.7%)
    aa       2    
D                 
  (n)    1 (33.3%)
    dd       1  
B                 
  (n)    0 (NaN%)

Oct 04 '21 20:10 anajens

@anajens, right, so this is another one of those "we oversimplified for the sake of API and now it does soemthing consistent but not what the user wants" situations, along with something that may arguably, be a bug.

The issue is that despite what row_paths will (currently) tell you, the B subtable is not a sibling of the A and D subtables. The class table containing B is a sibling of the class table containing both A and D, because those class tables are what were rbound together.

Sorting is only supported within the (sub)table appearing at a path. It cannot ever change whose parent a particular subtable or row is. That means it could in principle put B above A and D, thus achieving what your exact reprex wants to do, but it could not, even in principle, ever make the order (D, B, A) appear in the table.

Now as to the (possible/arguable) bug, it isn't great that rbind allowed itself to create a tree structure where two siblings have identical names. It arguably should have realized that was going to happen and changed the name of the second sibling to force uniqueness. I say arguably, because that is the name the table being rbound had for itself, but the argument is pretty strong, because a lot of thigns probably won't work, or won't work as desired, much like when an R list has non-unique names.

Also "rbind_rnoot" is hilarious. thats a typo it was once and should have stayed "rbind_root".

The pruning-based work around, which I'm not sure if its the same as what you meant or not, would be

df1 <- data.frame(
  class = factor(c("D", "A", "A"), levels = c("D", "A", "B")),
  term = factor(c("dd", "aa", "aa")),
  id = c(1, 2, 3)
)

t1 <- basic_table() %>%
  split_rows_by("class", child_labels = "visible", split_fun = trim_levels_in_group("term", drop_outlevs=FALSE)) %>%
  summarize_row_groups("id", label_fstr = "(n)") %>%
  analyze("term") %>%
  build_table(df1)

trim_rows(t1, function(tr) is(tr, "DataRow") && length(unlist(cell_values(tr))) == 0)

Note we are trimming here, instead of pruning, because pruning explicitly removes a parent once all of its children have been pruned, which is the exact behavior you don't want. trimming doesn't care at all about anything but the row itself. So the criteria is a) be a DataRow (this excluded allc ontent rows) and length(unlist(cell_values(tr))) == 0 which translates to all cell values are of length 0 (e.g., NULL or list()).

Also note the drop_outlevs = FALSE in the trim_levels_in_group call, which allows you to not manufacture data as long as the level list on the factor in question is complete.

Oct 04 '21 21:10 gmbecker

Thanks @gmbecker for the quick reply!

I think it's good if we talk about the rbind issue tomorrow. In a nutshell, I think it's an odd approach to take. I was just testing to see what was possible and got an odd result so I wanted to check with you. I would probably be in favour of being more strict here.

Thanks for the hint with drop_outlevs. I forgot that was an option but that's very helpful. Before the trimming step I do see something unexpected from the analysis function. Why is the "simple_analysis" row showing up?

> t1
                       all obs 
-------------------------------
D                              
  (n)                 1 (33.3%)
    dd                    1    
A                              
  (n)                 2 (66.7%)
    aa                    2    
B                              
  (n)                  0 (0%)  
    simple_analysis

Oct 04 '21 21:10 anajens

That is the name of the default analysis function, and thus the default analysis row name. Its an artifact of the factor method getting the row names from the names of the table, which it expects to be length>0, so when it gets a table of length 0, instead of one row per count in the table, it makes one row total with nothing in it and thus doesn't know what to name it. As such it falls back to the default name, ie the name/symbol of the analysis function.

Oct 04 '21 21:10 gmbecker