scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

dotplot cannot set gene_symbols

Open galamm opened this issue 5 years ago • 3 comments

I would like to change the var name when plotting dotplot by setting the gene_symbols to the desired name. But this generates the NameError. Below is an example using the scanpy built-in dataset.

adata = sc.datasets.pbmc68k_reduced()
ax = sc.pl.dotplot(adata, 'C1QA', groupby=['bulk_labels'], swap_axes=False, gene_symbols='TEST')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-45-206578ef4fd5> in <module>
      1 adata = sc.datasets.pbmc68k_reduced()
----> 2 ax = sc.pl.dotplot(adata, 'C1QA', groupby=['bulk_labels'], swap_axes=False, gene_symbols='TEST')
      3 

~/anaconda3/envs/scRNA/lib/python3.6/site-packages/scanpy/plotting/_dotplot.py in dotplot(adata, var_names, groupby, use_raw, log, num_categories, expression_cutoff, mean_only_expressed, cmap, dot_max, dot_min, standard_scale, smallest_dot, title, colorbar_title, size_title, figsize, dendrogram, gene_symbols, var_group_positions, var_group_labels, var_group_rotation, layer, swap_axes, dot_color_df, show, save, ax, return_fig, **kwds)
    930         dot_color_df=dot_color_df,
    931         ax=ax,
--> 932         **kwds,
    933     )
    934 

~/anaconda3/envs/scRNA/lib/python3.6/site-packages/scanpy/plotting/_dotplot.py in __init__(self, adata, var_names, groupby, use_raw, log, num_categories, categories_order, title, figsize, gene_symbols, var_group_positions, var_group_labels, var_group_rotation, layer, expression_cutoff, mean_only_expressed, standard_scale, dot_color_df, dot_size_df, ax, **kwds)
    142             layer=layer,
    143             ax=ax,
--> 144             **kwds,
    145         )
    146 

~/anaconda3/envs/scRNA/lib/python3.6/site-packages/scanpy/plotting/_baseplot_class.py in __init__(self, adata, var_names, groupby, use_raw, log, num_categories, categories_order, title, figsize, gene_symbols, var_group_positions, var_group_labels, var_group_rotation, layer, ax, **kwds)
    111             num_categories,
    112             layer=layer,
--> 113             gene_symbols=gene_symbols,
    114         )
    115 

~/anaconda3/envs/scRNA/lib/python3.6/site-packages/scanpy/plotting/_anndata.py in _prepare_dataframe(adata, var_names, groupby, use_raw, log, num_categories, layer, gene_symbols)
   1837         # translate the column names to the symbol names
   1838         obs_tidy.rename(
-> 1839             columns={var_names[x]: symbols[x] for x in range(len(var_names))},
   1840             inplace=True,
   1841         )

~/anaconda3/envs/scRNA/lib/python3.6/site-packages/scanpy/plotting/_anndata.py in <dictcomp>(.0)
   1837         # translate the column names to the symbol names
   1838         obs_tidy.rename(
-> 1839             columns={var_names[x]: symbols[x] for x in range(len(var_names))},
   1840             inplace=True,
   1841         )

NameError: free variable 'symbols' referenced before assignment in enclosing scope

Versions

scanpy==1.6.0 anndata==0.7.4 umap==0.3.10 numpy==1.18.4 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.21.3 statsmodels==0.10.1 python-igraph==0.8.2 louvain==0.6.1 leidenalg==0.8.0

galamm avatar Feb 10 '21 07:02 galamm

@galamm, thanks for the report! Also thanks for the example, very useful!

I think I see what the issue is here, though your error is unexpected. Are you using the most recent version of scanpy (1.7.0)?

The gene_symbols argument is supposed to refer to column in var that has more human readable gene names. The idea here is that you might have some unique identifier as var_names (like ensembl ids), but would have something more interpretable sorted in adata.var[gene_symbols].

On my machine, I get a KeyError when I run your example since there is no column "TEST" in adata.var. This is expected. It's strange to me that you get a NameError.

ivirshup avatar Feb 10 '21 08:02 ivirshup

Hello! I also have been running into issues when trying to use the gene_symbols parameter with the sc.pl.dotplot() function despite the column with the proper gene_symbols being in my adata.var Data Frame.

$ adata.var.columns
$ sc.pl.dotplot(adata, marker_genes, 'clusters', dendrogram=True, gene_symbols='alternate_gene_symbols')

==============================================================================

Index(['gene_symbols', 'feature_types', 'n_cells', 'highly_variable', 'means',
       'dispersions', 'dispersions_norm', 'mean', 'std',
       'alternate_gene_symbols'],
      dtype='object')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance)
   3620 try:
-> 3621     return self._engine.get_loc(casted_key)
   3622 except KeyError as err:

File ~/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File ~/miniconda3/envs/scanpy/lib/python3.9/site-packages/pandas/_libs/index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'alternate_gene_symbols'
...

When I tried setting adata.var['gene_symbols'] = adata.var['alternate_gene_symbols'] and trying to generate a dotplot with a random gene present in alternate_gene_symbols, I ran into the following error:

...
KeyError: "Could not find keys '['KH.C1.159.']' in columns of `adata.obs` or in adata.raw.var['gene_symbols']."

It seems that sc.pl.dotplot() is expecting gene_symbols that are present in the adata.raw.var Data Frame versus the adata.var Data Frame. Is this the expected behavior for this parameter?

mragsac avatar Oct 19 '22 18:10 mragsac

Hello! I'm running scanpy version 1.9.3 now and it seems that bug is still not fixed since it was found in scanpy 1.6.0. The situation is the same as in previous comments. I created the new column in the adata.var with some names changed to the GenBank ID (I'm working with non-model species and the majority of gene names are non-informative like "nbis-gene-11111", but I am interested in some genes of actin that I deposited in GenBank. I would like to put GB accessions into the plot.) I created the column with following code: ["bob" is the dataset name]

bob.var['GB_IDs'] = bob.var_names.copy() ID_dict = { "nbis-gene-777":"MT451954", "nbis-gene-775":"MT451955", "nbis-gene-3785":"MT451956", "nbis-gene-3784":"MT451957", "nbis-gene-23114":"MT451958", "nbis-gene-25113":"MT451959", "nbis-gene-3783":"MT518195" } bob.var['GB_IDs'].replace(ID_dict, inplace=True)

After that GB_IDs column was present in the dataframe. And then I tried to plot the dotplot:

dict = { "Actin 1": ["nbis-gene-777"], "Actin 2": ["nbis-gene-775"], "Actin 3": ["nbis-gene-3785"], "Actin 4": ["nbis-gene-3784"], "Actin 5": ["nbis-gene-23114"], "Actin 6": ["nbis-gene-25113"], "Actin 7": ["nbis-gene-3783"] } dp=sc.pl.dotplot(bob, dict, "scGate_multi", dendrogram=False, return_fig=True, cmap='YlGnBu', gene_symbols='GB_IDs')

This results in an error:


KeyError Traceback (most recent call last) File ~/software/SAMap/lib/python3.9/site-packages/pandas/core/indexes/base.py:3791, in Index.get_loc(self, key) 3790 try: -> 3791 return self._engine.get_loc(casted_key) 3792 except KeyError as err:

File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'GB_IDs'

If I correctly understand the docs (https://scanpy.readthedocs.io/en/latest/generated/scanpy.pl.dotplot.html), this code should work. I tried also to create such additional column in adata.raw.var, but that did not help as well.

VasiliyZubarev avatar Apr 10 '24 18:04 VasiliyZubarev