Listing Design Work
Some functionality needs to be added to rtables to support creation of listings with an easy to use interface.
Here is a basic prototype of a simple listing using existing rtables functionality:
library(rtables)
anl <- ex_adsl[1:10, ]
anl$cat_var <- paste(anl$COUNTRY, anl$STRATA1, anl$STRATA2, sep = "/")
cfun_list <- list(
lc_2 = function(df, labelstr = "") {
in_rows(x = df$SEX, .formats = "xx", .labels = as.character(df$USUBJID))
},
weight = function(df, labelstr = "") {
in_rows(x = df$BMRKR1, .formats = "xx.xx", .labels = as.character(df$USUBJID))
},
ethnic = function(df, labelstr = "") {
in_rows(x = df$cat_var, .formats = "xx", .labels = as.character(df$USUBJID))
}
)
basic_table() %>%
split_rows_by("ARM") %>%
split_cols_by_multivar(
vars = c("SEX", "BMRKR1", "cat_var"),
varlabels = c("SEX", "Bimoarker Status", "Country/Strata 1/Strata 2")
) %>%
split_rows_by("USUBJID") %>%
summarize_row_groups(cfun = cfun_list) %>%
build_table(anl)
Result:
SEX Bimoarker Status Country/Strata 1/Strata 2
----------------------------------------------------------------------------
A: Drug X
AB12345-CHN-3-id-128 M 14.42 CHN/C/S2
AB12345-USA-1-id-45 F 0.46 USA/C/S1
B: Placebo
AB12345-CHN-11-id-220 F 10.26 CHN/B/S2
AB12345-CHN-7-id-267 M 6.21 CHN/C/S1
AB12345-USA-1-id-261 F 2.86 USA/C/S1
AB12345-CHN-1-id-307 M 4.57 CHN/C/S1
C: Combination
AB12345-CHN-15-id-262 M 4.06 CHN/C/S1
AB12345-RUS-3-id-378 F 2.8 RUS/A/S1
AB12345-CHN-15-id-201 M 6.91 CHN/C/S2
AB12345-NGA-11-id-173 F 5 NGA/C/S2
@gmbecker , just wanted to share below some nice code developed by Liming Li for the internal EnalbleRF package to help with the design work in rtables:
Example use:
as_listing(ex_adsl[1:10, ], index_col = c("ARM", "USUBJID"), cols = c("SEX", "AGE", "RACE"))
as_listing(ex_adae[1:10, ], index_col = c("ARM", "USUBJID", "AEDECOD"), cols = c("ASTDY", "AENDY", "AREL"))
Output demographic listing:
Description of Planned Arm
Unique Subject Identifier Sex Age Race
-------------------------------------------------------------------
A: Drug X
AB12345-CHN-3-id-128
M 32 ASIAN
AB12345-USA-1-id-45
F 34 ASIAN
B: Placebo
AB12345-CHN-1-id-307
M 24 ASIAN
AB12345-CHN-11-id-220
F 26 ASIAN
AB12345-CHN-7-id-267
M 40 ASIAN
AB12345-USA-1-id-261
F 32 ASIAN
C: Combination
AB12345-CHN-15-id-201
M 49 ASIAN
AB12345-CHN-15-id-262
M 35 BLACK OR AFRICAN AMERICAN
AB12345-NGA-11-id-173
F 24 BLACK OR AFRICAN AMERICAN
AB12345-RUS-3-id-378
F 30 ASIAN
Output AE listing:
Description of Planned Arm
Unique Subject Identifier
Dictionary-Derived Term Analysis Start Relative Day Analysis End Relative Day Analysis Causality
----------------------------------------------------------------------------------------------------------
A: Drug X
AB12345-BRA-1-id-134
dcd A.1.1.1.2
497 725 N
608 674 N
dcd B.2.1.2.1
251 518 N
dcd D.1.1.4.2
304 559 N
C: Combination
AB12345-BRA-1-id-141
dcd A.1.1.1.1
493 516 N
635 678 N
dcd A.1.1.1.2
600 693 N
dcd B.2.1.2.1
259 266 N
dcd D.1.1.1.1
730 731 N
dcd D.2.1.5.3
303 723 Y
as_listing <- function(df, cols = NULL, index_col = NULL, formats = NULL, level = 0L) {
df <- df[, c(cols, index_col), drop = FALSE]
if (is.null(formats)) {
formats <- lapply(cols, function(x) NULL)
}
nc <- length(cols)
if (nc == 0) {
stop("listing should have non-empty `cols` argument")
}
if (length(index_col) == 0) {
body <- lapply(seq_len(nrow(df)), function(i){
r <- df[i, cols, drop = FALSE]
cells <- mapply(rcell, x = r, format = formats, SIMPLIFY = FALSE, MoreArgs = list(label = ""))
rrowl(NULL, cells)
})
head <- lapply(seq_len(level),
function(l) {
if (l == level) {
rrowl(NULL, rtables::var_labels(df[, cols, drop = FALSE], fill = TRUE))
} else {
rrowl(NULL, rep(c(' ', ' '), nc)[seq_len(nc)])
}
}
)
.lst <- lapply(head, function(x) unlist(row_values(x)))
fullcolinfo <- manual_cols(.lst = .lst)
fullbusiness <- names(collect_leaves(coltree(fullcolinfo)))
vals <- lapply(head, function(x) unlist(row_values(x)))
cspans <- lapply(head, rtables:::row_cspans)
repvals <- mapply(function(v, csp) rep(v, times = csp), v = vals, csp = cspans, SIMPLIFY = FALSE)
wanted <- rtables:::paste_em_n(repvals, length(repvals))
col_per_nc <- length(fullbusiness) / nc
fullbusiness_split <- split(fullbusiness, rep(seq_len(nc), rep(col_per_nc, nc)))
wantcols <- mapply(match, wanted, fullbusiness_split, USE.NAMES = FALSE) + seq(0, nc - 1) * col_per_nc
header <- rtables:::subset_cols(fullcolinfo, wantcols)
return(rtable(header = header, body))
} else {
findex <- index_col[1]
dfs <- split(df, df[[findex]], drop = TRUE)
tbls <- mapply(as_listing, df = dfs, MoreArgs = list(cols = cols, index_col = index_col[-1], formats = formats, level = level + 1L))
tbl_with_label <- lapply(names(tbls), function(nm) {
colinfo <- tbls[[nm]]@col_info
vtbl <- new(
"TableTree",
content = ElementaryTable(kids = list(), name = nm, cinfo = colinfo),
children = list(tbls[[nm]]),
name = nm,
level = level + 1L,
labelrow = LabelRow(0L, nm, cinfo = colinfo),
col_info = colinfo,
format = NULL,
indent_modifier = 0L)
})
ret <- rbindl_rtables(tbl_with_label, check_headers = TRUE)
top_left_text <- rtables::var_labels(df[, index_col, drop = FALSE], fill = TRUE)
top_left_indented <- sprintf("%s%s", stringr::str_dup(" ", seq_len(length(top_left_text)) - 1), top_left_text)
top_left(ret) <- top_left_indented
return(ret)
}
}
@anajens @waddella
I have two similar but importantly different listings apis that I've mocked up (only locally for now).
They both hinge on incrementally building up a single multi-var split in the column space and simultaneously a corresponding analyze_colvars split or group row group summary. The core difference is that add_listing_col takes the analyze_colvars appraoch, and thus is cleaner from a DOM perspective, but less perfectly reccreates the desired output.
add_listing_col2 goes the rowgroup summary route, which means everything in the table is going to be content rows. We'll need to ensure this doesn't break other things, but it "should" be allowed.
Note that in both cases, somewhat unfortunately, the listings columns must be added after splitting by USUBJID because the analyze/summarize they incrementally build up needs to e nested within/on that existing row split.
Also note that in both cases we are going to have to push pretty hard on optimization to get reasonable performance when splitting on USUBJID. Right now its sitting at ~3 seconds for ex_adsl which has 400 patients, so much too slow. I think I've identified a place where we can get some significant savings, but how fast we can make a table that has 10000 child tables remains an open question whose answer may not be to the spa's liking...
Below are the two in action.
> lyt <- basic_table() %>%
+ split_rows_by("USUBJID") %>%
+ add_listing_col("AGE", "Age", mean) %>%
+ add_listing_col("BMRKR1", "Biomarker1 \\ \nBiomarker2",
+ function(df) rcell(paste(df$BMRKR1, df$BMRKR2, sep = "\n")))
> build_table(lyt, ex_adsl)
Biomarker1 \
Age Biomarker2
-----------------------------------------------
AB12345-CHN-3-id-128
32 14.424933692778
MEDIUM
AB12345-CHN-15-id-262
35 4.05546277230382
LOW
AB12345-RUS-3-id-378
30 2.80323956920649
HIGH
AB12345-CHN-11-id-220
26 10.2627340069523
MEDIUM
AB12345-CHN-7-id-267
40 6.2067627167943
LOW
AB12345-CHN-15-id-201
49 6.9067988141075
MEDIUM
<snip>
> lyt2 <- basic_table() %>%
+ split_rows_by("USUBJID") %>%
+ add_listing_col2("AGE", "Age", mean) %>%
+ add_listing_col2("BMRKR1", "Biomarker1 \\ \nBiomarker2",
+ function(df) rcell(paste(df$BMRKR1, df$BMRKR2, sep = "\n")))
> build_table(lyt2, ex_adsl)
Biomarker1 \
Age Biomarker2
-----------------------------------------------
AB12345-CHN-3-id-128 32 14.424933692778
MEDIUM
AB12345-CHN-15-id-262 35 4.05546277230382
LOW
AB12345-RUS-3-id-378 30 2.80323956920649
HIGH
AB12345-CHN-11-id-220 26 10.2627340069523
MEDIUM
AB12345-CHN-7-id-267 40 6.2067627167943
LOW
AB12345-CHN-15-id-201 49 6.9067988141075
MEDIUM
AB12345-USA-1-id-45 34 0.463560441314472
LOW
AB12345-USA-1-id-261 32 2.85516419937308
HIGH
AB12345-NGA-11-id-173 24 4.99722573047567
LOW
AB12345-CHN-1-id-307 24 4.57499101339464
LOW
AB12345-CHN-7-id-28 40 11.1444469908374
MEDIUM
AB12345-CHN-4-id-73 24 2.86312402599659
MEDIUM
AB12345-RUS-1-id-52 40 7.20634823208459
HIGH
AB12345-PAK-11-id-268 28 2.82014082273392
MEDIUM
<snip>
@gmbecker as follow-up to our meeting , this is what I understand now the third proposal looks like using the non-rtables approach:
# do all data manipulation steps for df first using dplyr , tidyr, etc
ANL <- ADSL %>%
mutate(...)
basic_listing(keys = c(“ARM”, “USUBJID”) %>%
add_listing_col(var = "AGE", label = "Age", format = “xx.x”, align = “c”) %>%
add_listing_col(var = "RACE", label = "Race", format = “15C”, align = “l”) %>%
…
build_listing(df = ANL)
The add_listing_col will need to accept a lot of arguments related to formatting and display.
Something to discuss in the future is if the first "label" column can support multiple "key" variables. I think for your examples (1) and (2) it will work but I'm not sure if it will in (3) above. To save on space, ideally users will be able to choose to have the key variables nested rather than printed as parallel columns.
After further discussions we are currently looking at something much more data.frame based. I've committed an early POC for that in 907f2d8:
dat <- ex_adae
lsting <- as_listing(dat[1:25,], key_cols = c("USUBJID", "AESOC")) %>%
add_listing_col("AETOXGR") %>%
add_listing_col("BMRKR1", format = "xx.x") %>%
add_listing_col("AESER / AREL", fun = function(df) paste(df$AESER, df$AREL, sep = " / "))
mat <- listing_matrix_form(lsting)
Which results in
> cat(toString(mat))
Unique Subject Identifier AESOC Analysis Toxicity Grade Continous Level Biomarker 1 AESER / AREL
--------------------------------------------------------------------------------------------------------
AB12345-BRA-1-id-134 cl A 2 6.5 Y / N
2 6.5 Y / N
cl B 3 6.5 N / N
cl D 3 6.5 N / N
AB12345-BRA-1-id-141 cl A 1 7.5 N / N
2 7.5 Y / N
1 7.5 N / N
cl B 3 7.5 N / N
cl D 1 7.5 N / Y
5 7.5 Y / N
AB12345-BRA-1-id-236 cl B 5 7.7 N / Y
5 7.7 N / Y
5 7.7 N / Y
AB12345-BRA-1-id-265 cl C 2 10.3 N / Y
4 10.3 N / Y
cl D 3 10.3 N / N
5 10.3 Y / N
AB12345-BRA-1-id-42 cl A 2 2.3 Y / N
2 2.3 Y / N
2 2.3 Y / N
cl B 1 2.3 Y / N
5 2.3 N / Y
cl C 2 2.3 N / Y
2 2.3 N / Y
cl D 5 2.3 Y / N
@anajens @waddella
Hi @gmbecker , I have a naïve question following yesterday's chat. Say I have a long df, and I would like to section it into two parts and use the list, and display. Is it possible? Or should I really just make two separate list and display.
for the listing_df object, it is still a dataframe, right? so I should be able to append the two listing_dataframe together? I can still use the listing_df object for basic data wrangling? is that correct? What about the speed?
Many thanks!
Hi @gmbecker , I have a naïve question following yesterday's chat. Say I have a long df, and I would like to section it into two parts and use the list, and display. Is it possible? Or should I really just make two separate list and display.
It depends a little bit on what you mean here. currently if you have a list with two elements that are a listing_df it will display the list, ie [[1]], [[2]] followed by displaying the elements, which would hit the listing display machinery. So in that sense, I Think the answer is "sort of yes". Note at its core this is how the pagination machinery behaves. If you want something rendered, without the list wrapper, you essentially loop over the list of sublistings/subtables and cat or otherwise render them out separated by the page separator,w hatever that is.
for the listing_df object, it is still a dataframe, right? so I should be able to append the two listing_dataframe together? I can still use the listing_df object for basic data wrangling? is that correct? What about the speed?
Yes, a listing_df object is a data.frame (and a tbl_df, if the input data frame was). Whether you can use the listing_df for basic wrangling is a bit more complicated, because, for example, dplyr verbs are unlikley to return something of class listing_df ieven if their input was, and they may be stripping other attributes as well. This would be a bit problem in the current implementation, and a small-to-nonexistant problem in a different implementation that @waddella and I have discussed and I Will be looking into.
At the end of the day, however, it makes sense to conceptually separate basic data wrangling from "listing-ness", at least to me, so even if you can do that, i'm not convinced you should. Better to do the wrangling operating on "regular" data.frames/tibbles and then make alisting out of them and move forward with that after its done, I think.
I did some initial testing of 907f2d8c460f4216300961918d5d1da2ea3bde40 and have some issues:
library(rtables)
library(dplyr)
anl <- ex_adsl %>%
select(USUBJID, ARM, AGE) %>%
slice(1:10)
anl$AGE[1:3] <- NA
# (1) Error with NA values in numeric column.
lsting <- as_listing(anl, key_cols = c("ARM", "USUBJID")) %>%
add_listing_col("ARM") %>%
add_listing_col("USUBJID") %>%
add_listing_col("AGE")
mat <- listing_matrix_form(lsting)
cat(toString(mat))
# (2) Adding new line split in title is wonky.
anl <- anl %>%
var_relabel(USUBJID = "A very long \n title")
lsting <- as_listing(anl, key_cols = c("ARM", "USUBJID")) %>%
add_listing_col("ARM") %>%
add_listing_col("USUBJID")
mat <- listing_matrix_form(lsting)
cat(toString(mat))
# (3) How to do pagination and alignment?
@anajens both of the above errors/regresssions have now been fixed (currently only on gabe_tabletree_work, version 0.4.0.0003).
I've also exported the pagination first pass. The top level API is paginate_listing which takes essentially the same arguments as paginate_table.
Hi @gmbecker, I've submitted my feedback as separate issues. The only other issue I'm wondering about is whether it's necessary to have a cols argument in as_listing. This seems a little redundant and potentially confusing given interaction with add_listing_col.