`combineCols()` errors when called via `do.call()` with a named list of objects to be combined
Please note that this may properly be a bug report for S4Vectors or SumarizedExperiment, so I'm tagging @hpages, but I encountered it when applying combineCols() via do.call() to a list of SingleCellExperiment objects and perhaps something needs to be fixed here too?
suppressPackageStartupMessages(library(SingleCellExperiment))
example(SingleCellExperiment, echo = FALSE)
# Helper function to create SCEs with overlapping genes and non-overlapping samples.
f <- function(y) {
x <- sce
rd <- rowData(x)
rownames(x) <- paste0(
"Gene_",
sort(sample(2 * nrow(sce), nrow(x), replace = FALSE)))
colnames(x) <- paste0(y, "_", seq_len(ncol(x)))
x
}
sce1 <- f("A")
sce2 <- f("B")
sce3 <- f("C")
# Works when list is unnamed
list_of_sce <- list(sce1, sce2, sce3)
do.call(combineCols, list_of_sce)
#> class: SingleCellExperiment
#> dim: 349 300
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(349): Gene_3 Gene_4 ... Gene_381 Gene_393
#> rowData names(0):
#> colnames(300): A_1 A_2 ... C_99 C_100
#> colData names(0):
#> reducedDimNames(2): PCA tSNE
#> mainExpName: NULL
#> altExpNames(0):
# Errors when list is named
named_list_of_sce <- list_of_sce
names(named_list_of_sce) <- c("A", "B", "C")
do.call(combineCols, named_list_of_sce)
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'combineCols' for signature '"missing"'
I'm unsure exactly when this error is being triggered but I note that this error exists/persists lower down in the combineCols() methods stack, e.g., when applied to a DataFrame:
do.call(combineCols, lapply(list_of_sce, colData))
#> DataFrame with 300 rows and 0 columns
do.call(combineCols, lapply(named_list_of_sce, colData))
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'combineCols' for signature '"missing"'
Created on 2022-01-12 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.1.2 (2021-11-01)
#> os Ubuntu 20.04.3 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2022-01-12
#> pandoc 2.11.4 @ /usr/lib/rstudio/bin/pandoc/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> Biobase * 2.54.0 2021-10-26 [1] RSPM (R 4.1.2)
#> BiocGenerics * 0.40.0 2021-10-26 [1] RSPM (R 4.1.2)
#> bitops 1.0-7 2021-04-24 [1] RSPM (R 4.1.0)
#> cli 3.1.0 2021-10-27 [1] RSPM (R 4.1.2)
#> DelayedArray 0.20.0 2021-10-26 [1] RSPM (R 4.1.2)
#> digest 0.6.29 2021-12-01 [1] RSPM (R 4.1.2)
#> evaluate 0.14 2019-05-28 [1] RSPM (R 4.1.0)
#> fastmap 1.1.0 2021-01-25 [1] RSPM (R 4.1.0)
#> fs 1.5.2 2021-12-08 [1] RSPM (R 4.1.2)
#> GenomeInfoDb * 1.30.0 2021-10-26 [1] RSPM (R 4.1.2)
#> GenomeInfoDbData 1.2.7 2021-10-28 [1] RSPM (R 4.1.1)
#> GenomicRanges * 1.46.1 2021-11-18 [1] RSPM (R 4.1.2)
#> glue 1.6.0 2021-12-17 [1] RSPM (R 4.1.0)
#> highr 0.9 2021-04-16 [1] RSPM (R 4.1.0)
#> htmltools 0.5.2 2021-08-25 [1] RSPM (R 4.1.0)
#> IRanges * 2.28.0 2021-10-26 [1] RSPM (R 4.1.2)
#> knitr 1.37 2021-12-16 [1] RSPM (R 4.1.0)
#> lattice 0.20-45 2021-09-22 [4] CRAN (R 4.1.1)
#> magrittr 2.0.1 2020-11-17 [1] RSPM (R 4.1.0)
#> Matrix 1.4-0 2021-12-08 [4] CRAN (R 4.1.2)
#> MatrixGenerics * 1.6.0 2021-10-26 [1] RSPM (R 4.1.2)
#> matrixStats * 0.61.0 2021-09-17 [1] RSPM (R 4.1.1)
#> RCurl 1.98-1.5 2021-09-17 [1] RSPM (R 4.1.1)
#> reprex 2.0.1 2021-08-05 [1] RSPM (R 4.1.0)
#> rlang 0.4.12 2021-10-18 [1] RSPM (R 4.1.1)
#> rmarkdown 2.11 2021-09-14 [1] RSPM (R 4.1.1)
#> rstudioapi 0.13 2020-11-12 [1] RSPM (R 4.1.0)
#> S4Vectors * 0.32.3 2021-11-21 [1] RSPM (R 4.1.2)
#> sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.1.2)
#> SingleCellExperiment * 1.16.0 2021-10-26 [1] RSPM (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] RSPM (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [1] RSPM (R 4.1.0)
#> SummarizedExperiment * 1.24.0 2021-10-26 [1] RSPM (R 4.1.2)
#> withr 2.4.3 2021-11-30 [1] RSPM (R 4.1.2)
#> xfun 0.29 2021-12-14 [1] RSPM (R 4.1.0)
#> XVector 0.34.0 2021-10-26 [1] RSPM (R 4.1.2)
#> yaml 2.2.1 2020-02-01 [1] RSPM (R 4.1.0)
#> zlibbioc 1.40.0 2021-10-26 [1] RSPM (R 4.1.2)
#>
#> [1] /home/peter/R/x86_64-pc-linux-gnu-library/4.1
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
This is because combineCols's generic has x as its first argument, so when you have a named list where no element is named x, the x is missing. Not really sure what can be done here as the error pops up in the dispatch before any method is run.
Does the generic need to have x as its first argument or could it be like cbind() and just have ...?
I have a vague idea this makes methods dispatch tricky.
Note that even in base R they couldn't make their mind: cbind() dispatches on ... and c() dispatches on its first argument x.
The thing is that there are some complications when defining a generic with dispatch on .... One of them is that if you define a method for a given class, then dispatch will select that method only if all the supplied arguments have that class:
setGeneric("foo", signature="...", function(...) standardGeneric("foo"))
setMethod("foo", "character", function(...) "I'm the foo() method for character")
foo("a", 1)
# Error in standardGeneric("foo") :
# no method or default matching the "..." arguments in foo("a", 1)
foo(1, "a")
# Error in standardGeneric("foo") :
# no method or default matching the "..." arguments in foo(1, "a")
foo("a", "b")
# [1] "I'm the foo() method for character"
This is generally too strict. For example, a reasonable expectation is that if some of the supplied objects don't have the class of the first object, they'll be coerced and things will just work. Like in:
c(Rle(1:3), 21:22)
# integer-Rle of length 5 with 5 runs
# Lengths: 1 1 1 1 1
# Values : 1 2 3 21 22
For cbind() (and rbind()), they use a complicated mechanism to make this work:
cbind(DataFrame(c=11:14), data.frame(b=21:24))
# DataFrame with 4 rows and 2 columns
# c b
# <integer> <integer>
# 1 11 21
# 2 12 22
# 3 13 23
# 4 14 24
Yes it works, but that's because additional generic cbind2() is involved and called internally by cbind() under some conditions, and also because S4Vectors defines 3 cbind2() methods in addition to the cbind() method for DataFrame objects.
All that complexity/overhead is avoided by simply dispatching on the first argument like base::c(), S4Vectors::combineRows(), and S4Vectors::combineCols() do.
Not a perfect solution because, as you've noticed, named arguments can get in the way:
c(A=Rle(1:3), B=21:22)
# $A
# integer-Rle of length 3 with 3 runs
# Lengths: 1 1 1
# Values : 1 2 3
#
# $B1
# [1] 21
#
# $B2
# [1] 22
Even though this is a BIG gotcha with c(), it's not new, and, personally I've learned to live with it by always passing the list of objects to combine thru unname() in the do.call(c, unname(objects)) construct. So it's kind of natural for me to do the same thing with combineRows() or combineCols().
If we wanted to fix this, we'd need to introduce the combineRows2() and combineCols2() generics. They would dispatch on their first 2 arguments x and y. Then make combineRows() and combineCols() ordinary functions with ... for first argument. Note that these would no longer need to be generics, just ordinary functions that would call combineRows2()/combineCols2() recursively or in a loop.
Unfortunately there's no way we can fix this for c().
Thanks for the details, @hpages.
It seems like more trouble than it's worth to change the code.
But perhaps an addition to the documentation showing the do.call(combineCols, unname(objects)) construct would be valuable?
I'd be happy to write a PR, but where would this be best documented?
One of them is that if you define a method for a given class, then dispatch will select that method only if all the supplied arguments have that class:
Indeed, this was a major issue with the first versions of the combineCols precursors, given that I expected to be able to run them on a mixture of SCEs, RSEs and base SEs and get a result that didn't throw away (too much) information from derived classes.