Can I split large gene matrix by rows and run harmony in parallel
Hi harmony group,
I have a large gene expression matrix with 31734 genes by 147185 cells. As you can see from the screenshot below, running HarmonyMatrix() on the entire expression data returns a "not enough resource" type of error. I wonder if I can split the expression matrix by rows (i.e. split into gene blocks) and run them separately? Will this generate different results?

Thank you, Jack Kang
Hi Jack,
Thanks for the question! With such a large matrix, I would recommend two things:
(1) Subset to highly variable genes. (2) Use a memory efficient PCA package and then feed the PCA embeddings into HarmonyMatrix(..., do_pca=FALSE).
Hope that helps!
Ilya
The newer version of the package should be able to handle this input given that the other parts for the cell embedding computation are memory efficient.