scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

ValueError: b'Extrapolation not allowed with blending' when using `"sc.pp.highly_variable_genes"` function

Open JuHey opened this issue 2 years ago • 3 comments

Please make sure these conditions are met

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of scanpy.
  • [ ] (optional) I have confirmed this bug exists on the master branch of scanpy.

What happened?

During preprocessing of concatenated adata file for scvi-based label transfer, processing fails when applying "sc.pp.highly_variable_genes" function with "ValueError: b'Extrapolation not allowed with blending'"

Minimal code sample

aadata = aadata.concatenate(ref_data_WT)

aadata.X
<15445x13343 sparse matrix of type '<class 'numpy.float64'>'
	with 107849393 stored elements in Compressed Sparse Row format>

# pre-processing:
aadata.layers["counts"] = aadata.X.copy()
sc.pp.normalize_total(aadata, target_sum=1e4)
sc.pp.log1p(aadata)
aadata.raw = aadata

sc.pp.highly_variable_genes(aadata, flavor = 'seurat_v3', n_top_genes=2000,
                            layer = "counts", batch_key="batch", subset = True)#, span =0.5

Error output

ValueError                                Traceback (most recent call last)
Cell In[37], line 7
      4 sc.pp.log1p(aadata)
      5 aadata.raw = aadata
----> 7 sc.pp.highly_variable_genes(aadata, flavor = 'seurat_v3', n_top_genes=2000,
      8                             layer = "counts", batch_key="batch", subset = True)#, span =0.5

File ~/mambaforge/envs/soupxEnv/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py:441, in highly_variable_genes(adata, layer, n_top_genes, min_disp, max_disp, min_mean, max_mean, span, n_bins, flavor, subset, inplace, batch_key, check_values)
    439         sig = signature(_highly_variable_genes_seurat_v3)
    440         n_top_genes = cast(int, sig.parameters["n_top_genes"].default)
--> 441     return _highly_variable_genes_seurat_v3(
    442         adata,
    443         layer=layer,
    444         n_top_genes=n_top_genes,
    445         batch_key=batch_key,
    446         check_values=check_values,
    447         span=span,
    448         subset=subset,
    449         inplace=inplace,
    450     )
    452 if batch_key is None:
    453     df = _highly_variable_genes_single_batch(
    454         adata,
    455         layer=layer,
   (...)
    462         flavor=flavor,
    463     )

File ~/mambaforge/envs/soupxEnv/lib/python3.10/site-packages/scanpy/preprocessing/_highly_variable_genes.py:87, in _highly_variable_genes_seurat_v3(adata, layer, n_top_genes, batch_key, check_values, span, subset, inplace)
     85 x = np.log10(mean[not_const])
     86 model = loess(x, y, span=span, degree=2)
---> 87 model.fit()
     88 estimat_var[not_const] = model.outputs.fitted_values
     89 reg_std = np.sqrt(10**estimat_var)

File _loess.pyx:899, in _loess.loess.fit()

ValueError: b'Extrapolation not allowed with blending'

Versions

scanpy==1.9.8 anndata==0.10.2 umap==0.5.4 numpy==1.24.4 scipy==1.11.3 pandas==2.1.1 scikit-learn==1.3.1 statsmodels==0.14.0 igraph==0.10.8 pynndescent==0.5.10

JuHey avatar Feb 13 '24 21:02 JuHey

Hey - it would be most helpful to post user questions in the scverse forum - there, other users encountering the same question will be able to find a response easier :)

Here, to take care of bugs in scanpy, it is most helpful for us if you are able to share public data/a small part of it/a synthetic data example so that we can check whats going on. Would it possible to you to supply something like that, such that I can reproduce your example myself?

From a first glance, with seurat_v3 requiring count data, it is important that your .X (becoming the layer you refer to as counts) indeed contains counts, otherwise loess quickly runs into stability issues.

eroell avatar Feb 14 '24 20:02 eroell

From a first glance, with seurat_v3 requiring count data, it is important that your .X (becoming the layer you refer to as counts) indeed contains counts, otherwise loess quickly runs into stability issues.

I would expect there would be a warning here if this were the case, since check_values defaults to True.

But at least this person had the same error caused by passing in normalized values:

  • https://www.biostars.org/p/9535944/

The author of scikit-misc says:

pass surface="direct"

to the loess solver based only off the error message. So maybe we can enable that.

I don't know enough about loess to be able to say why that would fix this. It would be interesting to see the data that caused this error. I would definitely want to have a reproducible case before attempting a fix.

ivirshup avatar Mar 14 '24 17:03 ivirshup

My code also has the same bug problem, may I ask, how to solve it?

adata.raw = adata # keep full dimension safe sc.pp.highly_variable_genes( adata, flavor="seurat_v3",# n_top_genes=2000, layer="counts", batch_key="orig.ident", subset=True, span=1 )

Error output

`ValueError Traceback (most recent call last) Cell In[13], line 3 1 #高变基因选取 2 adata.raw = adata # keep full dimension safe ----> 3 sc.pp.highly_variable_genes( 4 adata, 5 flavor="seurat_v3",# 6 n_top_genes=2000, 7 layer="counts", 8 batch_key="orig.ident", 9 subset=True, 10 span=1 11 ) 13 filename = 'melanoma_sw_high_var.h5ad' 14 adata.write(filename)

File ~/miniconda3/envs/scanpy/lib/python3.12/site-packages/scanpy/preprocessing/_highly_variable_genes.py:441, in highly_variable_genes(adata, layer, n_top_genes, min_disp, max_disp, min_mean, max_mean, span, n_bins, flavor, subset, inplace, batch_key, check_values) 439 sig = signature(_highly_variable_genes_seurat_v3) 440 n_top_genes = cast(int, sig.parameters["n_top_genes"].default) --> 441 return _highly_variable_genes_seurat_v3( 442 adata, 443 layer=layer, 444 n_top_genes=n_top_genes, 445 batch_key=batch_key, 446 check_values=check_values, 447 span=span, 448 subset=subset, 449 inplace=inplace, 450 ) 452 if batch_key is None: 453 df = _highly_variable_genes_single_batch( 454 adata, 455 layer=layer, (...) 462 flavor=flavor, 463 )

File ~/miniconda3/envs/scanpy/lib/python3.12/site-packages/scanpy/preprocessing/_highly_variable_genes.py:87, in _highly_variable_genes_seurat_v3(adata, layer, n_top_genes, batch_key, check_values, span, subset, inplace) 85 x = np.log10(mean[not_const]) 86 model = loess(x, y, span=span, degree=2) ---> 87 model.fit() 88 estimat_var[not_const] = model.outputs.fitted_values 89 reg_std = np.sqrt(10**estimat_var)

File _loess.pyx:927, in _loess.loess.fit()

ValueError: b'Extrapolation not allowed with blending'`

FlowerPurr avatar Mar 26 '24 04:03 FlowerPurr