S4Vectors icon indicating copy to clipboard operation
S4Vectors copied to clipboard

Policy for `metadata` when combining objects

Open LTLA opened this issue 4 years ago • 0 comments

Consider:

library(S4Vectors)
X <- DataFrame(X=1)
metadata(X)$X <- "WHEE"

Y <- DataFrame(Y=1)
metadata(Y)$Y <- "FOO"

metadata(cbind(X, Y))
## $X
## [1] "WHEE"

That's fine, I guess. But then:

library(SummarizedExperiment)
xx <- SummarizedExperiment()
metadata(xx)$X <- "WHEE"

yy <- SummarizedExperiment()
metadata(yy)$Y <- "FOO"

metadata(cbind(xx, yy))
## $X
## [1] "WHEE"
## 
## $Y
## [1] "FOO"

Should there be a consistent policy here? IMO it would make most sense to c the metadata lists, removing duplicate names (plus a warning if their values are not identical). This has the nice properties of:

  • Preserving most information, provided that they have different names in the various objects. TBH, the lost information might not be too bad; list elements with the same name but different values aren't that helpful in downstream analyses anyway, especially if we no longer have the knowledge about which of the original objects they came from.
  • Ensuring that, e.g., cbind(df[,0], df) would give back df. This wouldn't be the case if you just continually appended the metadata lists together, which would arbitrarily extend the metadata list in the bind'd object.

One could even imagine writing a combineMetadata() function that all Annotated subclasses can call, so as to easily combine the metadata() fields in a standard way for c, rbind, cbind, combineRows, combineCols, etc. etc.

LTLA avatar Apr 08 '21 07:04 LTLA