dotplot cannot set gene_symbols
I would like to change the var name when plotting dotplot by setting the gene_symbols to the desired name. But this generates the NameError. Below is an example using the scanpy built-in dataset.
adata = sc.datasets.pbmc68k_reduced()
ax = sc.pl.dotplot(adata, 'C1QA', groupby=['bulk_labels'], swap_axes=False, gene_symbols='TEST')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-45-206578ef4fd5> in <module>
1 adata = sc.datasets.pbmc68k_reduced()
----> 2 ax = sc.pl.dotplot(adata, 'C1QA', groupby=['bulk_labels'], swap_axes=False, gene_symbols='TEST')
3
~/anaconda3/envs/scRNA/lib/python3.6/site-packages/scanpy/plotting/_dotplot.py in dotplot(adata, var_names, groupby, use_raw, log, num_categories, expression_cutoff, mean_only_expressed, cmap, dot_max, dot_min, standard_scale, smallest_dot, title, colorbar_title, size_title, figsize, dendrogram, gene_symbols, var_group_positions, var_group_labels, var_group_rotation, layer, swap_axes, dot_color_df, show, save, ax, return_fig, **kwds)
930 dot_color_df=dot_color_df,
931 ax=ax,
--> 932 **kwds,
933 )
934
~/anaconda3/envs/scRNA/lib/python3.6/site-packages/scanpy/plotting/_dotplot.py in __init__(self, adata, var_names, groupby, use_raw, log, num_categories, categories_order, title, figsize, gene_symbols, var_group_positions, var_group_labels, var_group_rotation, layer, expression_cutoff, mean_only_expressed, standard_scale, dot_color_df, dot_size_df, ax, **kwds)
142 layer=layer,
143 ax=ax,
--> 144 **kwds,
145 )
146
~/anaconda3/envs/scRNA/lib/python3.6/site-packages/scanpy/plotting/_baseplot_class.py in __init__(self, adata, var_names, groupby, use_raw, log, num_categories, categories_order, title, figsize, gene_symbols, var_group_positions, var_group_labels, var_group_rotation, layer, ax, **kwds)
111 num_categories,
112 layer=layer,
--> 113 gene_symbols=gene_symbols,
114 )
115
~/anaconda3/envs/scRNA/lib/python3.6/site-packages/scanpy/plotting/_anndata.py in _prepare_dataframe(adata, var_names, groupby, use_raw, log, num_categories, layer, gene_symbols)
1837 # translate the column names to the symbol names
1838 obs_tidy.rename(
-> 1839 columns={var_names[x]: symbols[x] for x in range(len(var_names))},
1840 inplace=True,
1841 )
~/anaconda3/envs/scRNA/lib/python3.6/site-packages/scanpy/plotting/_anndata.py in <dictcomp>(.0)
1837 # translate the column names to the symbol names
1838 obs_tidy.rename(
-> 1839 columns={var_names[x]: symbols[x] for x in range(len(var_names))},
1840 inplace=True,
1841 )
NameError: free variable 'symbols' referenced before assignment in enclosing scope
Versions
scanpy==1.6.0 anndata==0.7.4 umap==0.3.10 numpy==1.18.4 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.21.3 statsmodels==0.10.1 python-igraph==0.8.2 louvain==0.6.1 leidenalg==0.8.0
@galamm, thanks for the report! Also thanks for the example, very useful!
I think I see what the issue is here, though your error is unexpected. Are you using the most recent version of scanpy (1.7.0)?
The gene_symbols argument is supposed to refer to column in var that has more human readable gene names. The idea here is that you might have some unique identifier as var_names (like ensembl ids), but would have something more interpretable sorted in adata.var[gene_symbols].
On my machine, I get a KeyError when I run your example since there is no column "TEST" in adata.var. This is expected. It's strange to me that you get a NameError.
Hello! I also have been running into issues when trying to use the gene_symbols parameter with the sc.pl.dotplot() function despite the column with the proper gene_symbols being in my adata.var Data Frame.
$ adata.var.columns
$ sc.pl.dotplot(adata, marker_genes, 'clusters', dendrogram=True, gene_symbols='alternate_gene_symbols')
==============================================================================
Index(['gene_symbols', 'feature_types', 'n_cells', 'highly_variable', 'means',
'dispersions', 'dispersions_norm', 'mean', 'std',
'alternate_gene_symbols'],
dtype='object')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()
File ~/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/_libs/index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'alternate_gene_symbols'
...
When I tried setting adata.var['gene_symbols'] = adata.var['alternate_gene_symbols'] and trying to generate a dotplot with a random gene present in alternate_gene_symbols, I ran into the following error:
...
KeyError: "Could not find keys '['KH.C1.159.']' in columns of `adata.obs` or in adata.raw.var['gene_symbols']."
It seems that sc.pl.dotplot() is expecting gene_symbols that are present in the adata.raw.var Data Frame versus the adata.var Data Frame. Is this the expected behavior for this parameter?
Hello! I'm running scanpy version 1.9.3 now and it seems that bug is still not fixed since it was found in scanpy 1.6.0. The situation is the same as in previous comments. I created the new column in the adata.var with some names changed to the GenBank ID (I'm working with non-model species and the majority of gene names are non-informative like "nbis-gene-11111", but I am interested in some genes of actin that I deposited in GenBank. I would like to put GB accessions into the plot.) I created the column with following code: ["bob" is the dataset name]
bob.var['GB_IDs'] = bob.var_names.copy() ID_dict = { "nbis-gene-777":"MT451954", "nbis-gene-775":"MT451955", "nbis-gene-3785":"MT451956", "nbis-gene-3784":"MT451957", "nbis-gene-23114":"MT451958", "nbis-gene-25113":"MT451959", "nbis-gene-3783":"MT518195" } bob.var['GB_IDs'].replace(ID_dict, inplace=True)
After that GB_IDs column was present in the dataframe. And then I tried to plot the dotplot:
dict = { "Actin 1": ["nbis-gene-777"], "Actin 2": ["nbis-gene-775"], "Actin 3": ["nbis-gene-3785"], "Actin 4": ["nbis-gene-3784"], "Actin 5": ["nbis-gene-23114"], "Actin 6": ["nbis-gene-25113"], "Actin 7": ["nbis-gene-3783"] } dp=sc.pl.dotplot(bob, dict, "scGate_multi", dendrogram=False, return_fig=True, cmap='YlGnBu', gene_symbols='GB_IDs')
This results in an error:
KeyError Traceback (most recent call last) File ~/software/SAMap/lib/python3.9/site-packages/pandas/core/indexes/base.py:3791, in Index.get_loc(self, key) 3790 try: -> 3791 return self._engine.get_loc(casted_key) 3792 except KeyError as err:
File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()
File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'GB_IDs'
If I correctly understand the docs (https://scanpy.readthedocs.io/en/latest/generated/scanpy.pl.dotplot.html), this code should work. I tried also to create such additional column in adata.raw.var, but that did not help as well.