KeyError: 'base' when running `tl.rank_genes_groups`
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of scanpy.
- [x] (optional) I have confirmed this bug exists on the master branch of scanpy.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Minimal code sample (that we can copy&paste without having any data)
# Your code here
sc.tl.rank_genes_groups(adata, "origin", method="wilcoxon")
[Paste the error output produced by the above code here]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 sc.tl.rank_genes_groups(adata, "origin", method="wilcoxon")
2 sc.pl.rank_genes_groups(adata, n_genes=25, sharey=False)
File ~/app/miniconda3/envs/bio/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:590, in rank_genes_groups(adata, groupby, use_raw, groups, reference, n_genes, rankby_abs, pts, key_added, copy, method, corr_method, tie_correct, layer, **kwds)
580 adata.uns[key_added] = {}
581 adata.uns[key_added]['params'] = dict(
582 groupby=groupby,
583 reference=reference,
(...)
587 corr_method=corr_method,
588 )
--> 590 test_obj = _RankGenes(adata, groups_order, groupby, reference, use_raw, layer, pts)
592 if check_nonnegative_integers(test_obj.X) and method != 'logreg':
593 logg.warning(
594 "It seems you use rank_genes_groups on the raw count data. "
595 "Please logarithmize your data before calling rank_genes_groups."
596 )
File ~/app/miniconda3/envs/bio/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:93, in _RankGenes.__init__(self, adata, groups, groupby, reference, use_raw, layer, comp_pts)
82 def __init__(
83 self,
84 adata,
(...)
90 comp_pts=False,
91 ):
---> 93 if 'log1p' in adata.uns_keys() and adata.uns['log1p']['base'] is not None:
94 self.expm1_func = lambda x: np.expm1(x * np.log(adata.uns['log1p']['base']))
95 else:
KeyError: 'base'
Versions
[Paste the output of scanpy.logging.print_versions() leaving a blank line after the details tag]
anndata 0.8.0 scanpy 1.9.1
Levenshtein NA OpenSSL 21.0.0 PIL 9.1.0 adjustText NA airr 1.3.1 appdirs 1.4.4 asttokens NA attr 21.4.0 backcall 0.2.0 beta_ufunc NA binom_ufunc NA bioservices 1.8.4 boto3 1.21.42 botocore 1.24.42 brotli NA bs4 4.11.1 cattr NA certifi 2021.10.08 cffi 1.15.0 charset_normalizer 2.0.4 colorama 0.4.4 colorlog NA cryptography 36.0.0 cycler 0.10.0 cython_runtime NA dateutil 2.8.2 debugpy 1.6.0 decorator 5.1.1 defusedxml 0.7.1 easydev 0.12.0 entrypoints 0.4 executing 0.8.3 gseapy 0.10.8 h5py 3.6.0 hypergeom_ufunc NA idna 3.3 igraph 0.9.10 ipykernel 6.13.0 ipython_genutils 0.2.0 ipywidgets 7.7.0 jedi 0.18.1 jmespath 1.0.0 joblib 1.1.0 jupyter_server 1.16.0 kiwisolver 1.4.2 leidenalg 0.8.9 llvmlite 0.38.0 lxml 4.8.0 matplotlib 3.5.1 matplotlib_inline NA mpl_toolkits NA natsort 8.1.0 nbinom_ufunc NA networkx 2.8 numba 0.55.1 numpy 1.21.6 packaging 21.3 pandas 1.4.2 parasail 1.2.4 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA prompt_toolkit 3.0.29 psutil 5.9.0 ptyprocess 0.7.0 pure_eval 0.2.2 pycparser 2.21 pydev_ipython NA pydevconsole NA pydevd 2.8.0 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pyexpat NA pygments 2.11.2 pylab NA pynndescent 0.5.6 pyparsing 3.0.8 pytoml NA pytz 2022.1 requests 2.27.1 requests_cache 0.9.2 scipy 1.8.0 scirpy 0.10.1 seaborn 0.11.2 session_info 1.0.0 setuptools_scm NA six 1.16.0 sklearn 1.0.2 socks 1.7.1 soupsieve 2.3.2.post1 stack_data 0.2.0 statsmodels 0.13.2 texttable 1.6.4 threadpoolctl 3.1.0 tornado 6.1 tqdm 4.62.3 tracerlib NA traitlets 5.1.1 typing_extensions NA umap 0.5.3 url_normalize 1.4.3 urllib3 1.26.7 wcwidt
Same error here...any ideas?
-----
anndata 0.8.0
scanpy 1.8.2
sinfo 0.3.1
-----
PIL 9.0.1
PyQt5 NA
anndata 0.8.0
anndata2ri 0.0.0
atomicwrites 1.4.0
autoreload NA
backcall 0.2.0
backports NA
beta_ufunc NA
binom_ufunc NA
bs4 4.10.0
cached_property 1.5.2
cffi 1.15.0
chardet 4.0.0
cloudpickle 2.0.0
colorama 0.4.4
cycler 0.10.0
cython_runtime NA
cytoolz 0.11.2
dask 2022.02.0
dateutil 2.8.2
debugpy 1.5.1
decorator 5.1.1
defusedxml 0.7.1
dunamai 1.10.0
entrypoints 0.4
fsspec 2022.02.0
get_version 3.5.4
h5py 3.6.0
igraph 0.9.9
ipykernel 6.9.1
jedi 0.18.1
jinja2 3.0.3
joblib 1.1.0
kiwisolver 1.3.2
leidenalg 0.8.9
llvmlite 0.38.0
louvain 0.7.1
markupsafe 2.1.0
matplotlib 3.5.1
matplotlib_inline NA
mpl_toolkits NA
natsort 8.1.0
nbinom_ufunc NA
numba 0.55.1
numexpr 2.8.0
numpy 1.21.5
packaging 21.3
pandas 1.3.5
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
prompt_toolkit 3.0.27
psutil 5.9.0
ptyprocess 0.7.0
pydev_ipython NA
pydevconsole NA
pydevd 2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.11.2
pyparsing 3.0.7
pytz 2021.3
pytz_deprecation_shim NA
rpy2 3.4.2
scanpy 1.8.2
scipy 1.7.3
seaborn 0.11.2
setuptools 59.8.0
sinfo 0.3.1
sip NA
six 1.16.0
sklearn 1.0.2
soupsieve 2.3.1
sphinxcontrib NA
spyder 5.2.2
spyder_kernels 2.2.1
spydercustomize NA
statsmodels 0.13.2
storemagic NA
tables 3.7.0
texttable 1.6.4
threadpoolctl 3.1.0
tlz 0.11.2
toolz 0.11.2
tornado 6.1
traitlets 5.1.1
typing_extensions NA
tzlocal NA
wcwidth 0.2.5
wurlitzer 3.0.2
yaml 6.0
zipp NA
zmq 22.3.0
-----
IPython 7.32.0
jupyter_client 7.1.2
jupyter_core 4.9.2
-----
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0]
Linux-5.4.0-109-generic-x86_64-with-debian-bullseye-sid
16 logical CPU cores, x86_64
-----
Session information updated at 2022-04-20 18:16
And what is in adata.uns['log1p']?
Its empty, there is no base key.
I just found issue #2181 which mentions the same issue and a workaround. In case @naity2 or someone else comes looking.
Thank you @auesro!
For now, I use the line adata.uns['log1p']["base"] = None every time after reading a h5ad file.
adata.uns['log1p']["base"] = None
Thank you. I also had this error when calculating highly variable genes sc.pp.highly_variable_genes(Adult,batch_key='batch')
Still an issue also for me. Any news?
Trying out the tutorials these days and it seems this issue still persists.
Here is what I got from running the tutorial pbmc3k.ipynb:
Before writing the AnnData object to a .h5ad file (after the PCA step; before computing the neighborhood graph)
- Inside
adata.uns:
OverloadedDict, wrapping:
OrderedDict([('log1p', {'base': None}), ('hvg', {'flavor': 'seurat'}), ('pca', {'params': {'zero_center': True, 'use_highly_variable': True}, 'variance': array([ (not showing the numbers for simplicity here) ],
dtype=float32), 'variance_ratio': array([ (not showing the numbers for simplicity here) ],
dtype=float32)})])
With overloaded keys:
['neighbors'].
After loading the matrix from the .h5ad file:
- Inside
adata.uns, thelog1pkey became an empty dictionary:
OverloadedDict, wrapping:
{'hvg': {'flavor': 'seurat'}, 'log1p': {}, 'pca': {'params': {'use_highly_variable': True, 'zero_center': True}, 'variance': array([ (not showing the numbers for simplicity here) ],
dtype=float32), 'variance_ratio': array([ (not showing the numbers for simplicity here) ],
dtype=float32)}}
With overloaded keys:
['neighbors'].
Although adata.uns['log1p']["base"] = None seems work for tl.rank_genes_groups the results is weird in my analysis. When I check, logfoldchange, values didn't make any sense. Some of them are almost near 100. Is there any case also or maybe I'm wrong.
Just to let you know that the same issue happened here when running the tutorial with my data.
Same here. adata.uns['log1p']["base"] = None eliminated the error, but the FC seems weird. I compared the FC results with Seurat FindMarker results, which used the same FC calcualtion. For most genes, Scanpy resulted in much higher FC (some gets 30 or more), which I have never seen.
@LuckyMD requires attentions to several of the threads above from Scanpy team. Thanks!
Same error here.
Duplicate of scverse/anndata#673
I have a fix waiting in scverse/anndata#999