PyDESeq2 icon indicating copy to clipboard operation
PyDESeq2 copied to clipboard

Handling NaN counts

Open BorisMuzellec opened this issue 3 years ago • 3 comments

Currently, PyDESeq2 throws an error when trying to initialise a DeseqDataSet with a count matrix that contains NaNs – this is to reproduce DESeq2's behaviour.

As pointed out by @arthurPignetOwkin, it seems like it would make sense to simply raise a warning instead and carry on with the analysis, and return NaNs for dispersions, LFCs, and p-values of genes that have NaN counts (as we already do for genes whose counts are all-zero).

BorisMuzellec avatar Dec 28 '22 09:12 BorisMuzellec

Just wanted to add on that I encountered this error when trying to run on a sparse matrix. When I densify it it is fine, but I am sure that users with large data matrices will appreciate being able to run without having to densify their matrices.

fairliereese avatar May 17 '23 22:05 fairliereese

Hopefully this gets implemented soon.

koh-joshua avatar Oct 31 '23 03:10 koh-joshua

Hi, I'm also noticing that the default functionality breaks when the input data isn't densified ahead of time -- the internal validation functions assume that the input counts are dense numpy / pandas objects, despite the default AnnData behavior recasting these inputs into sparse matrices.

I'm not exactly sure how the tutorials on the main website are able to run in the first place -- I have not been able to run any of these tutorials (with new data) without recasting the via

from pydeseq2.dds import DeseqDataSet
dds = DeseqDataSet(counts=df, metadata=md)
dds.X = np.array(dds.X.todense())

mortonjt avatar Nov 27 '23 18:11 mortonjt