Link to SVA/RUV methods?
Will, great package! I have a conceptual question about GLM-PCA with covariates: when I have sample (or gene) covariates, is GLM-PCA estimating a factor model on the variance that is not explained by the covariates? How is GLM-PCA with covariates similar or different from SVA or RUV methods (beyond different link functions)?
Hi, thanks for your interest in the package. It is similar to RUV/ SVA in that you are fitting both a factor model and a regression model. However, in GLM-PCA the intent is for the factor model to capture the biology whereas the covariates are more for the "nuisance" variables like batch labels. In SVA and RUV, the covariates contain the biology (treatment vs control labels) and the factor model is meant to capture the nuisance variation from hidden batch effects. SVA was developed for analyzing bulk RNA-seq data whereas GLM-PCA was originated from single-cell data. Also, SVA fits the models sequentially whereas GLM-PCA fits everything simultaneously. I am less familiar with RUV. Another sequential approach is our "fast residuals approximation to GLM-PCA" which is implemented in scry.
We have not thoroughly investigated the properties of the covariates. It was just something relatively straightforward to implement so we put it in to allow future users to experiment with.
Thanks Will. I think it would be interesting to compare GLM-PCA with SVA particularly the sequential vs simultaneous fitting. I'll let you know if I find anything :)