S4Vectors
S4Vectors copied to clipboard
Policy for `metadata` when combining objects
Consider:
library(S4Vectors)
X <- DataFrame(X=1)
metadata(X)$X <- "WHEE"
Y <- DataFrame(Y=1)
metadata(Y)$Y <- "FOO"
metadata(cbind(X, Y))
## $X
## [1] "WHEE"
That's fine, I guess. But then:
library(SummarizedExperiment)
xx <- SummarizedExperiment()
metadata(xx)$X <- "WHEE"
yy <- SummarizedExperiment()
metadata(yy)$Y <- "FOO"
metadata(cbind(xx, yy))
## $X
## [1] "WHEE"
##
## $Y
## [1] "FOO"
Should there be a consistent policy here? IMO it would make most sense to c the metadata lists, removing duplicate names (plus a warning if their values are not identical). This has the nice properties of:
- Preserving most information, provided that they have different names in the various objects. TBH, the lost information might not be too bad; list elements with the same name but different values aren't that helpful in downstream analyses anyway, especially if we no longer have the knowledge about which of the original objects they came from.
- Ensuring that, e.g.,
cbind(df[,0], df)would give backdf. This wouldn't be the case if you just continually appended themetadatalists together, which would arbitrarily extend themetadatalist in the bind'd object.
One could even imagine writing a combineMetadata() function that all Annotated subclasses can call, so as to easily combine the metadata() fields in a standard way for c, rbind, cbind, combineRows, combineCols, etc. etc.