scanpy KeyError: 'base' when running `tl.rank_genes

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of scanpy.
[x] (optional) I have confirmed this bug exists on the master branch of scanpy.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Minimal code sample (that we can copy&paste without having any data)

# Your code here
sc.tl.rank_genes_groups(adata, "origin", method="wilcoxon")

[Paste the error output produced by the above code here]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 sc.tl.rank_genes_groups(adata, "origin", method="wilcoxon")
      2 sc.pl.rank_genes_groups(adata, n_genes=25, sharey=False)

File ~/app/miniconda3/envs/bio/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:590, in rank_genes_groups(adata, groupby, use_raw, groups, reference, n_genes, rankby_abs, pts, key_added, copy, method, corr_method, tie_correct, layer, **kwds)
    580 adata.uns[key_added] = {}
    581 adata.uns[key_added]['params'] = dict(
    582     groupby=groupby,
    583     reference=reference,
   (...)
    587     corr_method=corr_method,
    588 )
--> 590 test_obj = _RankGenes(adata, groups_order, groupby, reference, use_raw, layer, pts)
    592 if check_nonnegative_integers(test_obj.X) and method != 'logreg':
    593     logg.warning(
    594         "It seems you use rank_genes_groups on the raw count data. "
    595         "Please logarithmize your data before calling rank_genes_groups."
    596     )

File ~/app/miniconda3/envs/bio/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:93, in _RankGenes.__init__(self, adata, groups, groupby, reference, use_raw, layer, comp_pts)
     82 def __init__(
     83     self,
     84     adata,
   (...)
     90     comp_pts=False,
     91 ):
---> 93     if 'log1p' in adata.uns_keys() and adata.uns['log1p']['base'] is not None:
     94         self.expm1_func = lambda x: np.expm1(x * np.log(adata.uns['log1p']['base']))
     95     else:

KeyError: 'base'

Versions

[Paste the output of scanpy.logging.print_versions() leaving a blank line after the details tag]

anndata 0.8.0 scanpy 1.9.1

Levenshtein NA OpenSSL 21.0.0 PIL 9.1.0 adjustText NA airr 1.3.1 appdirs 1.4.4 asttokens NA attr 21.4.0 backcall 0.2.0 beta_ufunc NA binom_ufunc NA bioservices 1.8.4 boto3 1.21.42 botocore 1.24.42 brotli NA bs4 4.11.1 cattr NA certifi 2021.10.08 cffi 1.15.0 charset_normalizer 2.0.4 colorama 0.4.4 colorlog NA cryptography 36.0.0 cycler 0.10.0 cython_runtime NA dateutil 2.8.2 debugpy 1.6.0 decorator 5.1.1 defusedxml 0.7.1 easydev 0.12.0 entrypoints 0.4 executing 0.8.3 gseapy 0.10.8 h5py 3.6.0 hypergeom_ufunc NA idna 3.3 igraph 0.9.10 ipykernel 6.13.0 ipython_genutils 0.2.0 ipywidgets 7.7.0 jedi 0.18.1 jmespath 1.0.0 joblib 1.1.0 jupyter_server 1.16.0 kiwisolver 1.4.2 leidenalg 0.8.9 llvmlite 0.38.0 lxml 4.8.0 matplotlib 3.5.1 matplotlib_inline NA mpl_toolkits NA natsort 8.1.0 nbinom_ufunc NA networkx 2.8 numba 0.55.1 numpy 1.21.6 packaging 21.3 pandas 1.4.2 parasail 1.2.4 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA prompt_toolkit 3.0.29 psutil 5.9.0 ptyprocess 0.7.0 pure_eval 0.2.2 pycparser 2.21 pydev_ipython NA pydevconsole NA pydevd 2.8.0 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pyexpat NA pygments 2.11.2 pylab NA pynndescent 0.5.6 pyparsing 3.0.8 pytoml NA pytz 2022.1 requests 2.27.1 requests_cache 0.9.2 scipy 1.8.0 scirpy 0.10.1 seaborn 0.11.2 session_info 1.0.0 setuptools_scm NA six 1.16.0 sklearn 1.0.2 socks 1.7.1 soupsieve 2.3.2.post1 stack_data 0.2.0 statsmodels 0.13.2 texttable 1.6.4 threadpoolctl 3.1.0 tornado 6.1 tqdm 4.62.3 tracerlib NA traitlets 5.1.1 typing_extensions NA umap 0.5.3 url_normalize 1.4.3 urllib3 1.26.7 wcwidt

Apr 18 '22 17:04 naity2

Same error here...any ideas?

-----
anndata     0.8.0
scanpy      1.8.2
sinfo       0.3.1
-----
PIL                         9.0.1
PyQt5                       NA
anndata                     0.8.0
anndata2ri                  0.0.0
atomicwrites                1.4.0
autoreload                  NA
backcall                    0.2.0
backports                   NA
beta_ufunc                  NA
binom_ufunc                 NA
bs4                         4.10.0
cached_property             1.5.2
cffi                        1.15.0
chardet                     4.0.0
cloudpickle                 2.0.0
colorama                    0.4.4
cycler                      0.10.0
cython_runtime              NA
cytoolz                     0.11.2
dask                        2022.02.0
dateutil                    2.8.2
debugpy                     1.5.1
decorator                   5.1.1
defusedxml                  0.7.1
dunamai                     1.10.0
entrypoints                 0.4
fsspec                      2022.02.0
get_version                 3.5.4
h5py                        3.6.0
igraph                      0.9.9
ipykernel                   6.9.1
jedi                        0.18.1
jinja2                      3.0.3
joblib                      1.1.0
kiwisolver                  1.3.2
leidenalg                   0.8.9
llvmlite                    0.38.0
louvain                     0.7.1
markupsafe                  2.1.0
matplotlib                  3.5.1
matplotlib_inline           NA
mpl_toolkits                NA
natsort                     8.1.0
nbinom_ufunc                NA
numba                       0.55.1
numexpr                     2.8.0
numpy                       1.21.5
packaging                   21.3
pandas                      1.3.5
parso                       0.8.3
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
prompt_toolkit              3.0.27
psutil                      5.9.0
ptyprocess                  0.7.0
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.11.2
pyparsing                   3.0.7
pytz                        2021.3
pytz_deprecation_shim       NA
rpy2                        3.4.2
scanpy                      1.8.2
scipy                       1.7.3
seaborn                     0.11.2
setuptools                  59.8.0
sinfo                       0.3.1
sip                         NA
six                         1.16.0
sklearn                     1.0.2
soupsieve                   2.3.1
sphinxcontrib               NA
spyder                      5.2.2
spyder_kernels              2.2.1
spydercustomize             NA
statsmodels                 0.13.2
storemagic                  NA
tables                      3.7.0
texttable                   1.6.4
threadpoolctl               3.1.0
tlz                         0.11.2
toolz                       0.11.2
tornado                     6.1
traitlets                   5.1.1
typing_extensions           NA
tzlocal                     NA
wcwidth                     0.2.5
wurlitzer                   3.0.2
yaml                        6.0
zipp                        NA
zmq                         22.3.0
-----
IPython             7.32.0
jupyter_client      7.1.2
jupyter_core        4.9.2
-----
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0]
Linux-5.4.0-109-generic-x86_64-with-debian-bullseye-sid
16 logical CPU cores, x86_64
-----
Session information updated at 2022-04-20 18:16

Apr 20 '22 16:04 auesro

And what is in adata.uns['log1p']?

Apr 20 '22 16:04 Koncopd

Its empty, there is no base key.

I just found issue #2181 which mentions the same issue and a workaround. In case @naity2 or someone else comes looking.

Apr 20 '22 16:04 auesro

Thank you @auesro!

For now, I use the line adata.uns['log1p']["base"] = None every time after reading a h5ad file.

Apr 20 '22 16:04 naity2

adata.uns['log1p']["base"] = None

Thank you. I also had this error when calculating highly variable genes sc.pp.highly_variable_genes(Adult,batch_key='batch')

Jul 14 '22 15:07 brianpenghe

Still an issue also for me. Any news?

Oct 15 '22 08:10 giorgiatosoni

Trying out the tutorials these days and it seems this issue still persists.

Here is what I got from running the tutorial pbmc3k.ipynb: Before writing the AnnData object to a .h5ad file (after the PCA step; before computing the neighborhood graph)

Inside adata.uns:

OverloadedDict, wrapping:
	OrderedDict([('log1p', {'base': None}), ('hvg', {'flavor': 'seurat'}), ('pca', {'params': {'zero_center': True, 'use_highly_variable': True}, 'variance': array([ (not showing the numbers for simplicity here) ],
      dtype=float32), 'variance_ratio': array([ (not showing the numbers for simplicity here) ],
      dtype=float32)})])
With overloaded keys:
	['neighbors'].

After loading the matrix from the .h5ad file:

Inside adata.uns, the log1p key became an empty dictionary:

OverloadedDict, wrapping:
	{'hvg': {'flavor': 'seurat'}, 'log1p': {}, 'pca': {'params': {'use_highly_variable': True, 'zero_center': True}, 'variance': array([ (not showing the numbers for simplicity here) ],
      dtype=float32), 'variance_ratio': array([ (not showing the numbers for simplicity here) ],
      dtype=float32)}}
With overloaded keys:
	['neighbors'].

Nov 18 '22 10:11 jasonleongbio

Although adata.uns['log1p']["base"] = None seems work for tl.rank_genes_groups the results is weird in my analysis. When I check, logfoldchange, values didn't make any sense. Some of them are almost near 100. Is there any case also or maybe I'm wrong.

Dec 05 '22 22:12 m21camby

Just to let you know that the same issue happened here when running the tutorial with my data.

Jan 23 '23 17:01 lubianat

Same here. adata.uns['log1p']["base"] = None eliminated the error, but the FC seems weird. I compared the FC results with Seurat FindMarker results, which used the same FC calcualtion. For most genes, Scanpy resulted in much higher FC (some gets 30 or more), which I have never seen.

Feb 03 '23 04:02 KoichiHashikawa

@LuckyMD requires attentions to several of the threads above from Scanpy team. Thanks!

Feb 16 '23 18:02 KoichiHashikawa

Same error here.

May 08 '23 22:05 zhenxingjian

Duplicate of scverse/anndata#673

I have a fix waiting in scverse/anndata#999

Jun 07 '23 14:06 flying-sheep

KeyError: 'base' when running `tl.rank_genes_groups`

Minimal code sample (that we can copy&paste without having any data)

Versions

[Paste the output of scanpy.logging.print_versions() leaving a blank line after the details tag]

anndata 0.8.0 scanpy 1.9.1