get_pseudobulk losing obs columns with NA values
Hi,
It appears that get_pseudobulk loses .obs columns when they contain NAs, even when they are unique for each sample ID.
Here's an example.
import pertpy
import decoupler
adata = pertpy.dt.distance_example()
adata.X.data = np.round(adata.X.data) # doing this just for illustration purposes
In this dataset, where adata.obs['perturbation'] == 'control', the value of adata.obs['target'] is set to NA. Even though all the control cells have the same NA value in target, I lose this column when pseudobulking
> pdata = decoupler.get_pseudobulk(adata, sample_col = 'perturbation', groups_col=None)
> 'target' in pdata.obs
False
but the column is kept when I substitute the NAs
> adata.obs['target'] = np.where(adata.obs['perturbation'] == 'control', 'no-target', adata.obs['target'])
> pdata = decoupler.get_pseudobulk(adata, sample_col = 'perturbation', groups_col=None)
> 'target' in pdata.obs
True
I would expect the function to keep the target columns with NAs in this case
Decoupler version: '1.8.0'
Hi @emdann,
Ups! I recently refactored this and introduced this bug, now it should be fixed by using .nunique(dropna=False) in 0dd3da67e681c74e7771b78dc53227370590f23d
Thanks for noticing and reporting it! You can install the latest version from GitHub to try it out. Let me know if anything else breaks.