ir.tl.chain_qc doesn't mark cells with no IR
Describe the bug 'ir.tl.chain_qc' states that it should mark cells that don't have any detected immune receptor (as stated in the docs). In the new data structures, this information should be in the "airr:recepetor_type", etc slots, annotated as "no IR". However when you run this function, cells that are lacking and IR are just annotated with nan values, so you can't see how many cells don't have an associated IR when plotting.
To Reproduce
import muon as mu
import scirpy as ir
mdata = ir.datasets.wu2020_3k()
adata = mdata['gex'].copy()
adata_tcr = mdata['airr'].copy()
adata_tcr = adata_tcr[0:-100,:].copy() #artificially remove tcr info from last 100 cells
mdata = mu.MuData({"gex": adata, "airr": adata_tcr})
ir.pp.index_chains(mdata)
ir.tl.chain_qc(mdata)
mdata.obs["airr:receptor_subtype"].tail() #visualize the info on last cells - they are stored as nans
#plot the subtypes
_ = ir.pl.group_abundance(
mdata, groupby="airr:receptor_subtype", target_col="gex:source"
)
Expected behaviour Cells with no IR should be annotated as "no IR", according to docs (https://scirpy.scverse.org/en/latest/generated/scirpy.tl.chain_qc.html)
System
- OS: macOSX 13.5.2
- Python version 3.11
- Versions of libraries involved: scirpy 0.13.1, scanpy 1.9.5, muon 0.1.5
Additional context
Hi,
thanks for reporting this!
I believe the reason is that chain_qc operates on the mdata["airr"] slot which obviously only contains cells with a receptor. Writing back the information to mdata.obs coerces nan for cells that are not in mdata["airr"].
It should be possible to fix this pretty easily.
I fixed the chain_qc function to also compute values for cells not in the AIRR modality in https://github.com/scverse/scirpy/pull/463. But plotting the result requires changes to the group_abundance function that I'll tackle together with overhauling completely how barplots are generated (which is planned for a while, see https://github.com/scverse/scirpy/issues/232)