Failed violins
Please make sure these conditions are met
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of scanpy.
- [ ] (optional) I have confirmed this bug exists on the main branch of scanpy.
What happened?
This was supposed to be a violin plot of total_counts. Notice that some cell categories have no data. This is by design: some categories defined but not assigned to any samples. They are assigned and used elsewhere. This totally breaks the violin plots, which work only if all categories have at least some data. I like that empty categories are still but I would like to see non-empty violins.
Minimal code sample
This code can be used to have additional unassigned categories added:
ord = ['B', 'B_mz', 'B_gro', 'B_pls', 'B_mem',
'Th', 'Th_reg', 'Th_mem', 'Tc', 'Tc_act', 'Tc_mem',
'NKT', 'NK_0', 'NK_1', 'NK_2',
'ncMo', 'cMo', 'DC_1', 'DC_2', 'MΦ_1', 'MΦ_2',
'Ne', 'RBC', 'PLT', 'HSC', 'Whatever', 'Whatnot', 'Unassigned', 'Huh?', 'What?']
adata.obs['cell_type'] = pd.Categorical(values=adata.obs.cell_type, categories=ord, ordered=True)
### Error output
_No response_
### Versions
scanpy==1.10.1 anndata==0.10.7 umap==0.5.5 numpy==1.26.4 scipy==1.13.0 pandas==2.2.2 scikit-learn==1.4.2 statsmodels==0.14.1 igraph==0.10.3 pynndescent==0.5.12
Hey, thanks for the request.
To be able to reproduce and help, it is a big aid for us if you can supply a code sample that we can run: that is, with some dummy data (the datasets scanpy readily supplies are great for that), and the error/unexpected behaviour you get.
I think in your case this would be e.g.
import scanpy as sc
adata = sc.datasets.pbmc68k_reduced()
adata.obs["louvain"] = adata.obs["louvain"].cat.set_categories(new_categories=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"])
sc.pl.violin(adata, keys='n_counts', groupby='louvain')
Yielding
ValueError: The palette dictionary is missing keys: {'11'}
Is that the issue you are facing?
I do not know why set_categories fails to add the new ones for you. Perhaps you need to added ordered=True. Notice that in my example I use a different method of adding additional categories which works:
ord = ['1','2','3', 'Whatever', 'Whatnot', 'Huh?', 'What?']
adata.obs['cell_type'] = pd.Categorical(values=adata.obs.cell_type, categories=ord, ordered=True)
Then try to plot any violin plot.
To be able to reproduce and help, it is a big aid for us if you can supply a code sample that we can run: that is, with some dummy data (the datasets scanpy readily supplies are great for that), and the error/unexpected behaviour you get.
Can you show such an example, with data? It is not immediately clear to me what specific you are trying to add or construct; I'm not sure whether basically the dataframe gets destroyed by the operation you intend to perform, or whether it is the violin plot failing (if the dataframe is crooked, it would be this to be fixed)