PyDESeq2 icon indicating copy to clipboard operation
PyDESeq2 copied to clipboard

[BUG] continuous_factor errors out in _build_contrast

Open jeffhsu3 opened this issue 1 year ago • 2 comments

Describe the bug Setting continuous_factors in a DeseqDataSet exclusively causes an error in DeseqStats when building a contrast. However, if the same factor is included in the design_factors, it is converted to a Categorical type and works without error.

To Reproduce

dds = DeseqDataSet(
    adata=adf,
    design_factors=["treatment"],
    continuous_factors=["time"],
    ref_level=["treatment", "CTRL"],
)
stat_res_time = DeseqStats(dds, contrast=["time", "", ""])

The adf.obs.time.dtype is int64. This raises the following in the _build_contrast call:

The contrast variable ('time') should be one of the design factors.

Would just changing the check in _build_contrast in DeseqStats be enough?

pydeseq2 version: 0.4.11

Expected behavior The if statement should also check if the factor is in self.dds.continuous_factors

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information): Ubuntu 22.04

Additional context Add any other context about the problem here.

jeffhsu3 avatar Sep 11 '24 18:09 jeffhsu3

Hi @jeffhsu3,

Continuous factors must indeed also be listed as design factors. (This is the meaning - maybe not so clear - of the error message you get.)

I.e., in your case, the code should be changed to

dds = DeseqDataSet(
    adata=adf,
    design_factors=["treatment", "time"],
    continuous_factors=["time"],
    ref_level=["treatment", "CTRL"],
)
stat_res_time = DeseqStats(dds, contrast=["time", "", ""])

You mentioned that this caused time to be treated as a categorical factor, could you provide an example of this behaviour?

Thanks!

BorisMuzellec avatar Sep 12 '24 08:09 BorisMuzellec

Thanks!

The design matrix isn't affected, but the obs df is.

print(adf.obs.time.dtype)  # Output: dtype('int64')

# After creating DeseqDataSet
dds = DeseqDataSet(
    adata=adf,
    design_factors=["treatment", "time"],
    continuous_factors=["time"],
    ref_level=["treatment", "CTRL"],
)
print(dds.obs.time.dtype)  # Output: dtype('O')

print(dds.obsm['design_matrix']) # Output: dtype('int64')

jeffhsu3 avatar Sep 12 '24 17:09 jeffhsu3

Closing this because of #328

BorisMuzellec avatar Nov 19 '24 13:11 BorisMuzellec