rtables icon indicating copy to clipboard operation
rtables copied to clipboard

Listing Design Work

Open anajens opened this issue 4 years ago • 9 comments

Some functionality needs to be added to rtables to support creation of listings with an easy to use interface.

Here is a basic prototype of a simple listing using existing rtables functionality:

library(rtables)

anl <- ex_adsl[1:10, ]
anl$cat_var <- paste(anl$COUNTRY, anl$STRATA1, anl$STRATA2, sep = "/")

cfun_list <- list(
  lc_2 = function(df, labelstr = "") {
    in_rows(x = df$SEX, .formats = "xx", .labels = as.character(df$USUBJID))
  },
  weight = function(df, labelstr = "") {
    in_rows(x = df$BMRKR1,  .formats = "xx.xx", .labels = as.character(df$USUBJID))
  },
  ethnic = function(df, labelstr = "") {
    in_rows(x = df$cat_var,  .formats = "xx", .labels = as.character(df$USUBJID))
  }
)

basic_table() %>%
  split_rows_by("ARM") %>%
  split_cols_by_multivar(
    vars = c("SEX", "BMRKR1", "cat_var"),
    varlabels = c("SEX", "Bimoarker Status", "Country/Strata 1/Strata 2")
  ) %>%
  split_rows_by("USUBJID") %>%
  summarize_row_groups(cfun = cfun_list) %>%
  build_table(anl)

Result:

                          SEX   Bimoarker Status   Country/Strata 1/Strata 2
----------------------------------------------------------------------------
A: Drug X                                                                   
  AB12345-CHN-3-id-128     M         14.42                 CHN/C/S2         
  AB12345-USA-1-id-45      F          0.46                 USA/C/S1         
B: Placebo                                                                  
  AB12345-CHN-11-id-220    F         10.26                 CHN/B/S2         
  AB12345-CHN-7-id-267     M          6.21                 CHN/C/S1         
  AB12345-USA-1-id-261     F          2.86                 USA/C/S1         
  AB12345-CHN-1-id-307     M          4.57                 CHN/C/S1         
C: Combination                                                              
  AB12345-CHN-15-id-262    M          4.06                 CHN/C/S1         
  AB12345-RUS-3-id-378     F          2.8                  RUS/A/S1         
  AB12345-CHN-15-id-201    M          6.91                 CHN/C/S2         
  AB12345-NGA-11-id-173    F           5                   NGA/C/S2    

anajens avatar Mar 29 '21 17:03 anajens

@gmbecker , just wanted to share below some nice code developed by Liming Li for the internal EnalbleRF package to help with the design work in rtables:

Example use:

as_listing(ex_adsl[1:10, ], index_col = c("ARM", "USUBJID"), cols = c("SEX", "AGE", "RACE"))
as_listing(ex_adae[1:10, ], index_col = c("ARM", "USUBJID", "AEDECOD"), cols = c("ASTDY", "AENDY", "AREL"))

Output demographic listing:

Description of Planned Arm                                         
  Unique Subject Identifier   Sex   Age             Race           
-------------------------------------------------------------------
A: Drug X                                                          
  AB12345-CHN-3-id-128                                             
                               M    32              ASIAN          
  AB12345-USA-1-id-45                                              
                               F    34              ASIAN          
B: Placebo                                                         
  AB12345-CHN-1-id-307                                             
                               M    24              ASIAN          
  AB12345-CHN-11-id-220                                            
                               F    26              ASIAN          
  AB12345-CHN-7-id-267                                             
                               M    40              ASIAN          
  AB12345-USA-1-id-261                                             
                               F    32              ASIAN          
C: Combination                                                     
  AB12345-CHN-15-id-201                                            
                               M    49              ASIAN          
  AB12345-CHN-15-id-262                                            
                               M    35    BLACK OR AFRICAN AMERICAN
  AB12345-NGA-11-id-173                                            
                               F    24    BLACK OR AFRICAN AMERICAN
  AB12345-RUS-3-id-378                                             
                               F    30              ASIAN          


Output AE listing:

Description of Planned Arm                                                                                
  Unique Subject Identifier                                                                               
    Dictionary-Derived Term   Analysis Start Relative Day   Analysis End Relative Day   Analysis Causality
----------------------------------------------------------------------------------------------------------
A: Drug X                                                                                                 
  AB12345-BRA-1-id-134                                                                                    
    dcd A.1.1.1.2                                                                                         
                                          497                          725                      N         
                                          608                          674                      N         
    dcd B.2.1.2.1                                                                                         
                                          251                          518                      N         
    dcd D.1.1.4.2                                                                                         
                                          304                          559                      N         
C: Combination                                                                                            
  AB12345-BRA-1-id-141                                                                                    
    dcd A.1.1.1.1                                                                                         
                                          493                          516                      N         
                                          635                          678                      N         
    dcd A.1.1.1.2                                                                                         
                                          600                          693                      N         
    dcd B.2.1.2.1                                                                                         
                                          259                          266                      N         
    dcd D.1.1.1.1                                                                                         
                                          730                          731                      N         
    dcd D.2.1.5.3                                                                                         
                                          303                          723                      Y    
as_listing <- function(df, cols = NULL, index_col = NULL, formats = NULL, level = 0L) {
  df <- df[, c(cols, index_col), drop = FALSE]
  if (is.null(formats)) {
    formats <- lapply(cols, function(x) NULL)
  }
  nc <- length(cols)
  if (nc == 0) {
    stop("listing should have non-empty `cols` argument")
  }
  if (length(index_col) == 0) {
    body <- lapply(seq_len(nrow(df)), function(i){
      r <- df[i, cols, drop = FALSE]
      cells <- mapply(rcell, x = r, format = formats, SIMPLIFY = FALSE, MoreArgs = list(label = ""))
      rrowl(NULL, cells)
    })
    
    head <- lapply(seq_len(level), 
                   function(l) {
                     if (l == level) {
                       rrowl(NULL, rtables::var_labels(df[, cols, drop = FALSE], fill = TRUE))
                     } else {
                       rrowl(NULL, rep(c(' ', '  '), nc)[seq_len(nc)])
                     }
                   }
    )
    .lst <- lapply(head, function(x) unlist(row_values(x)))
    fullcolinfo <- manual_cols(.lst = .lst)
    fullbusiness <- names(collect_leaves(coltree(fullcolinfo)))
    vals <- lapply(head, function(x) unlist(row_values(x)))
    cspans <- lapply(head, rtables:::row_cspans)
    repvals <- mapply(function(v, csp) rep(v, times = csp), v = vals, csp = cspans, SIMPLIFY = FALSE)
    wanted <- rtables:::paste_em_n(repvals, length(repvals))
    col_per_nc <- length(fullbusiness) / nc
    fullbusiness_split <- split(fullbusiness, rep(seq_len(nc), rep(col_per_nc, nc)))
    wantcols <- mapply(match, wanted, fullbusiness_split, USE.NAMES = FALSE) + seq(0, nc - 1) * col_per_nc 
    header <- rtables:::subset_cols(fullcolinfo, wantcols)
    return(rtable(header = header, body))
  } else {
    findex <- index_col[1]
    dfs <- split(df, df[[findex]], drop = TRUE)
    tbls <- mapply(as_listing, df = dfs, MoreArgs = list(cols = cols, index_col = index_col[-1], formats = formats, level = level + 1L))
    tbl_with_label <- lapply(names(tbls), function(nm) {
      colinfo <- tbls[[nm]]@col_info
      vtbl <- new(
        "TableTree", 
        content = ElementaryTable(kids = list(), name = nm, cinfo = colinfo),
        children = list(tbls[[nm]]),
        name = nm,
        level = level + 1L,
        labelrow = LabelRow(0L, nm, cinfo = colinfo),
        col_info = colinfo,
        format = NULL,
        indent_modifier = 0L)
    })
    ret <- rbindl_rtables(tbl_with_label, check_headers = TRUE)
    top_left_text <- rtables::var_labels(df[, index_col, drop = FALSE], fill = TRUE)
    top_left_indented <- sprintf("%s%s", stringr::str_dup("  ", seq_len(length(top_left_text)) - 1), top_left_text)
    top_left(ret) <- top_left_indented
    return(ret)
  }
}

anajens avatar Sep 07 '21 16:09 anajens

@anajens @waddella

I have two similar but importantly different listings apis that I've mocked up (only locally for now).

They both hinge on incrementally building up a single multi-var split in the column space and simultaneously a corresponding analyze_colvars split or group row group summary. The core difference is that add_listing_col takes the analyze_colvars appraoch, and thus is cleaner from a DOM perspective, but less perfectly reccreates the desired output.

add_listing_col2 goes the rowgroup summary route, which means everything in the table is going to be content rows. We'll need to ensure this doesn't break other things, but it "should" be allowed.

Note that in both cases, somewhat unfortunately, the listings columns must be added after splitting by USUBJID because the analyze/summarize they incrementally build up needs to e nested within/on that existing row split.

Also note that in both cases we are going to have to push pretty hard on optimization to get reasonable performance when splitting on USUBJID. Right now its sitting at ~3 seconds for ex_adsl which has 400 patients, so much too slow. I think I've identified a place where we can get some significant savings, but how fast we can make a table that has 10000 child tables remains an open question whose answer may not be to the spa's liking...

Below are the two in action.

> lyt <- basic_table() %>%
+     split_rows_by("USUBJID") %>%
+     add_listing_col("AGE", "Age", mean) %>%
+     add_listing_col("BMRKR1", "Biomarker1 \\ \nBiomarker2",
+                     function(df) rcell(paste(df$BMRKR1, df$BMRKR2, sep = "\n")))
> build_table(lyt, ex_adsl)
                                Biomarker1 \   
                        Age      Biomarker2    
-----------------------------------------------
AB12345-CHN-3-id-128                           
                        32     14.424933692778 
                                   MEDIUM      
AB12345-CHN-15-id-262                          
                        35    4.05546277230382 
                                     LOW       
AB12345-RUS-3-id-378                           
                        30    2.80323956920649 
                                    HIGH       
AB12345-CHN-11-id-220                          
                        26    10.2627340069523 
                                   MEDIUM      
AB12345-CHN-7-id-267                           
                        40     6.2067627167943 
                                     LOW       
AB12345-CHN-15-id-201                          
                        49     6.9067988141075 
                                   MEDIUM      
<snip>
> lyt2 <- basic_table() %>%
+     split_rows_by("USUBJID") %>%
+     add_listing_col2("AGE", "Age", mean) %>%
+     add_listing_col2("BMRKR1", "Biomarker1 \\ \nBiomarker2",
+                     function(df) rcell(paste(df$BMRKR1, df$BMRKR2, sep = "\n")))
> build_table(lyt2, ex_adsl)
                                Biomarker1 \   
                        Age      Biomarker2    
-----------------------------------------------
AB12345-CHN-3-id-128    32     14.424933692778 
                                   MEDIUM      
AB12345-CHN-15-id-262   35    4.05546277230382 
                                     LOW       
AB12345-RUS-3-id-378    30    2.80323956920649 
                                    HIGH       
AB12345-CHN-11-id-220   26    10.2627340069523 
                                   MEDIUM      
AB12345-CHN-7-id-267    40     6.2067627167943 
                                     LOW       
AB12345-CHN-15-id-201   49     6.9067988141075 
                                   MEDIUM      
AB12345-USA-1-id-45     34    0.463560441314472
                                     LOW       
AB12345-USA-1-id-261    32    2.85516419937308 
                                    HIGH       
AB12345-NGA-11-id-173   24    4.99722573047567 
                                     LOW       
AB12345-CHN-1-id-307    24    4.57499101339464 
                                     LOW       
AB12345-CHN-7-id-28     40    11.1444469908374 
                                   MEDIUM      
AB12345-CHN-4-id-73     24    2.86312402599659 
                                   MEDIUM      
AB12345-RUS-1-id-52     40    7.20634823208459 
                                    HIGH       
AB12345-PAK-11-id-268   28    2.82014082273392 
                                   MEDIUM      
<snip>

gmbecker avatar Sep 09 '21 00:09 gmbecker

@gmbecker as follow-up to our meeting , this is what I understand now the third proposal looks like using the non-rtables approach:

# do all data manipulation steps for df first using dplyr , tidyr, etc
ANL <- ADSL %>% 
  mutate(...)

basic_listing(keys = c(“ARM”, “USUBJID”) %>%
  add_listing_col(var = "AGE", label = "Age", format = “xx.x”, align = “c”) %>%
  add_listing_col(var = "RACE", label = "Race", format = “15C”, align = “l”) %>%
  …
  build_listing(df = ANL)

The add_listing_col will need to accept a lot of arguments related to formatting and display.

Something to discuss in the future is if the first "label" column can support multiple "key" variables. I think for your examples (1) and (2) it will work but I'm not sure if it will in (3) above. To save on space, ideally users will be able to choose to have the key variables nested rather than printed as parallel columns.

anajens avatar Sep 14 '21 17:09 anajens

After further discussions we are currently looking at something much more data.frame based. I've committed an early POC for that in 907f2d8:

 dat <- ex_adae
lsting <- as_listing(dat[1:25,], key_cols = c("USUBJID", "AESOC")) %>%
     add_listing_col("AETOXGR") %>%
     add_listing_col("BMRKR1", format = "xx.x") %>%
     add_listing_col("AESER / AREL", fun = function(df) paste(df$AESER, df$AREL, sep = " / "))


 mat <- listing_matrix_form(lsting)

Which results in

> cat(toString(mat))
Unique Subject Identifier   AESOC   Analysis Toxicity Grade   Continous Level Biomarker 1   AESER / AREL
--------------------------------------------------------------------------------------------------------
  AB12345-BRA-1-id-134      cl A               2                          6.5                  Y / N    
                                               2                          6.5                  Y / N    
                            cl B               3                          6.5                  N / N    
                            cl D               3                          6.5                  N / N    
  AB12345-BRA-1-id-141      cl A               1                          7.5                  N / N    
                                               2                          7.5                  Y / N    
                                               1                          7.5                  N / N    
                            cl B               3                          7.5                  N / N    
                            cl D               1                          7.5                  N / Y    
                                               5                          7.5                  Y / N    
  AB12345-BRA-1-id-236      cl B               5                          7.7                  N / Y    
                                               5                          7.7                  N / Y    
                                               5                          7.7                  N / Y    
  AB12345-BRA-1-id-265      cl C               2                         10.3                  N / Y    
                                               4                         10.3                  N / Y    
                            cl D               3                         10.3                  N / N    
                                               5                         10.3                  Y / N    
   AB12345-BRA-1-id-42      cl A               2                          2.3                  Y / N    
                                               2                          2.3                  Y / N    
                                               2                          2.3                  Y / N    
                            cl B               1                          2.3                  Y / N    
                                               5                          2.3                  N / Y    
                            cl C               2                          2.3                  N / Y    
                                               2                          2.3                  N / Y    
                            cl D               5                          2.3                  Y / N  

@anajens @waddella

gmbecker avatar Oct 25 '21 19:10 gmbecker

Hi @gmbecker , I have a naïve question following yesterday's chat. Say I have a long df, and I would like to section it into two parts and use the list, and display. Is it possible? Or should I really just make two separate list and display.

for the listing_df object, it is still a dataframe, right? so I should be able to append the two listing_dataframe together? I can still use the listing_df object for basic data wrangling? is that correct? What about the speed?

Many thanks!

shajoezhu avatar Oct 29 '21 12:10 shajoezhu

Hi @gmbecker , I have a naïve question following yesterday's chat. Say I have a long df, and I would like to section it into two parts and use the list, and display. Is it possible? Or should I really just make two separate list and display.

It depends a little bit on what you mean here. currently if you have a list with two elements that are a listing_df it will display the list, ie [[1]], [[2]] followed by displaying the elements, which would hit the listing display machinery. So in that sense, I Think the answer is "sort of yes". Note at its core this is how the pagination machinery behaves. If you want something rendered, without the list wrapper, you essentially loop over the list of sublistings/subtables and cat or otherwise render them out separated by the page separator,w hatever that is.

for the listing_df object, it is still a dataframe, right? so I should be able to append the two listing_dataframe together? I can still use the listing_df object for basic data wrangling? is that correct? What about the speed?

Yes, a listing_df object is a data.frame (and a tbl_df, if the input data frame was). Whether you can use the listing_df for basic wrangling is a bit more complicated, because, for example, dplyr verbs are unlikley to return something of class listing_df ieven if their input was, and they may be stripping other attributes as well. This would be a bit problem in the current implementation, and a small-to-nonexistant problem in a different implementation that @waddella and I have discussed and I Will be looking into.

At the end of the day, however, it makes sense to conceptually separate basic data wrangling from "listing-ness", at least to me, so even if you can do that, i'm not convinced you should. Better to do the wrangling operating on "regular" data.frames/tibbles and then make alisting out of them and move forward with that after its done, I think.

gmbecker avatar Nov 01 '21 20:11 gmbecker

I did some initial testing of 907f2d8c460f4216300961918d5d1da2ea3bde40 and have some issues:

library(rtables)
library(dplyr)

anl <- ex_adsl %>%
  select(USUBJID, ARM, AGE) %>%
  slice(1:10)

anl$AGE[1:3] <- NA

# (1) Error with NA values in numeric column.
lsting <- as_listing(anl, key_cols = c("ARM", "USUBJID")) %>%
  add_listing_col("ARM") %>%
  add_listing_col("USUBJID") %>%
  add_listing_col("AGE") 

mat <- listing_matrix_form(lsting)
cat(toString(mat))

# (2) Adding new line split in title is wonky.
anl <- anl %>%
  var_relabel(USUBJID = "A very long \n title")

lsting <- as_listing(anl, key_cols = c("ARM", "USUBJID")) %>%
  add_listing_col("ARM") %>%
  add_listing_col("USUBJID")

mat <- listing_matrix_form(lsting)
cat(toString(mat))

# (3) How to do pagination and alignment?

anajens avatar Nov 02 '21 16:11 anajens

@anajens both of the above errors/regresssions have now been fixed (currently only on gabe_tabletree_work, version 0.4.0.0003).

I've also exported the pagination first pass. The top level API is paginate_listing which takes essentially the same arguments as paginate_table.

gmbecker avatar Nov 05 '21 00:11 gmbecker

Hi @gmbecker, I've submitted my feedback as separate issues. The only other issue I'm wondering about is whether it's necessary to have a cols argument in as_listing. This seems a little redundant and potentially confusing given interaction with add_listing_col.

anajens avatar Nov 08 '21 21:11 anajens