scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

rank_genes_groups method='logreg' causes error with `sc.get.rank_genes_groups_df`

Open fidelram opened this issue 5 years ago • 3 comments

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of scanpy.
  • [x] (optional) I have confirmed this bug exists on the master branch of scanpy.

when running sc.tl.rank_genes_groups with method='logreg' the logfoldchanges are not generated. However, this field is required by sc.get.rank_genes_groups_df.

adata = sc.datasets.pbmc68k_reduced()
sc.tl.rank_genes_groups(adata, 'bulk_labels', method='logreg')
sc.get.rank_genes_groups_df(adata, 'Dendritic')
/scanpy/scanpy/get.py in rank_genes_groups_df(adata, group, key, pval_cutoff, log2fc_min, log2fc_max, gene_symbols)
     56     d = pd.DataFrame()
     57     for k in ['scores', 'names', 'logfoldchanges', 'pvals', 'pvals_adj']:
---> 58         d[k] = adata.uns[key][k][group]
     59     if pval_cutoff is not None:
     60         d = d[d["pvals_adj"] < pval_cutoff]
KeyError: 'logfoldchanges'

Versions

[Paste the output of scanpy.logging.print_versions() leaving a blank line after the details tag]

fidelram avatar Dec 03 '20 21:12 fidelram

Has this problem been solved?

luluZuo avatar Jan 26 '22 06:01 luluZuo

The same issue.

xyang2uchicago avatar Jul 14 '22 16:07 xyang2uchicago

I am currently having the same issue as well. As a user-only mostly, I tried to dig into the code and found a workaround to get a dataframe with the logreg scores (so, please forgive any inaccuracy and my naivety).

After sc.tl.rank_genes_groups with method='logreg':

colnames = ['names', `'scores']

test = [pd.DataFrame(adata.uns["logreg"][c])[group] for c in colnames]
test = pd.concat(test, axis=1, names=[None, 'group'], keys=colnames)
test = test.stack(level=1).reset_index()
test["group"] = test["group"].astype("int")
test.sort_values('group', inplace=True)

test

I guess the code could be adapted to expect the exception of the logistic regression being different, i.e. not having logfoldchange and p-values, and allow the retrieval of a Dataframe with scores nonetheless.

ftamiro avatar Sep 05 '22 02:09 ftamiro

The same issue

UboCA avatar Oct 25 '22 01:10 UboCA

Additionally, I tried to use sc.get.rank_genes_groups_df in 3 different version of scanpy. And this issue happened in 1.8.2, 1.9.1 but not 1.6.1.

UboCA avatar Oct 25 '22 03:10 UboCA

same prob

yh154 avatar Sep 01 '23 18:09 yh154

Should be solved now (scanpy 1.9.5, some earlier versions already), see comment here #2363.

eroell avatar Sep 14 '23 15:09 eroell

Thanks everyone for posting here following up this issue! As the initial reproducible example now works with the latest scanpy versions, we might close the issue soon. Kindly let us know in a new issue if you keep experiencing this issue, along with the scanpy version you are using!

eroell avatar Oct 02 '23 07:10 eroell