question about batch
Hello, I have 3 studies which I want to annotate using a built reference. I wonder if what I am doing is correct. I label transfered from the built reference for each dataset. I integrated the 3 studies by Seurat and harmony in R using seurat v5. but I started here in symphonypy from counts and followed the tutorial. Should I label transfer for the whole object and not one dataset at at time? would the batch corrected object help at all?
Hi, @Flu09!
First of all, if you're more familiar with R, it's better to use the original Symphony: https://github.com/immunogenomics/symphony
Secondly, you can explicitly put information about batches during label transfer using key argument (it's better to do it this way — and the results should be similar to the label transfer for individual batches):
sp.tl.map_embedding(adata_query=adata_query, adata_ref=adata_ref, key=batch_key)
Overall Symphony performance on Seurat-corrected expressions wasn't benchmarked, so we can't say if it will give some meaningful results.
I see thank you so much. I have this error. Do you have any suggestions?
sp.tl.map_embedding(adata_query=sample, adata_ref=adata) 538 out of 3000 genes from the reference are missing in the query dataset or have zero std in the reference, their expressions in the query will be set to zero Traceback (most recent call last): File "
", line 1, in File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/tools.py", line 336, in map_embedding _map_query_to_ref( File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/_utils.py", line 278, in _map_query_to_ref t = _adjust_for_missing_genes( File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/_utils.py", line 240, in _adjust_for_missing_genes X = adata[:, use_genes_list[use_genes_list_present]].X File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/anndata/_core/anndata.py", line 591, in X _subset(self._adata_ref.X, (self._oidx, self._vidx)), File "/usr/lib64/python3.9/functools.py", line 888, in wrapper return dispatch(args[0].class)(*args, **kw) File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/anndata/_core/index.py", line 165, in _subset_spmatrix return a[subset_idx] File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scipy/sparse/_index.py", line 68, in getitem return self._get_sliceXarray(row, col) File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scipy/sparse/_csr.py", line 326, in _get_sliceXarray return self._major_slice(row)._minor_index_fancy(col) File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scipy/sparse/_compressed.py", line 768, in _minor_index_fancy csr_column_index1(k, idx, M, N, self.indptr, self.indices, ValueError: Output dtype not compatible with inputs.
Hi @Flu09! I'm so sorry that you are encountering this bug! What's the datatype of your sparse matrix adata_query.X in the example above?
float 64 for both the reference and the samples. I think they need to be converted to float32 and the column of the celltype to catergory?
print(adata.obs['cell_type_high_resolution'].dtype) object adata.X <1353075x33538 sparse matrix of type '<class 'numpy.float64'>' with 4457926739 stored elements in Compressed Sparse Row format> sample.X <3057x38152 sparse matrix of type '<class 'numpy.float64'>' with 4187950 stored elements in Compressed Sparse Row format>
Eh, float64 seems to be OK, I was just hoping that it's connected this bug with np.float16: https://stackoverflow.com/questions/40046118/why-cant-i-assign-data-to-part-of-sparse-matrix-in-the-first-try
@Flu09 Don't you mind sharing the least subsample of data to reproduce the error? Probably it could be a couple of cells per dataset.
Probably related to https://github.com/scverse/anndata/issues/1349?
I can try preparing some data to share. changing both reference and sample to float32 solved the previous issue.
New error message below
sp.tl.map_embedding(adata_query=sample, adata_ref=adata)
538 out of 3000 genes from the reference are missing in the query dataset or have zero std in the reference, their expressions in the query will be set to zero
>>>
>>> # Mapping UMAP coordinates
>>> sp.tl.ingest(adata_query=sample, adata_ref=adata)
/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/umap/umap_.py:1943: UserWarning: n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism.
warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")
TypeError: float() argument must be a string or a number, not 'csr_matrix'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/tools.py", line 238, in ingest
ing.map_embedding(method)
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scanpy/tools/_ingest.py", line 499, in map_embedding
self._obsm['X_umap'] = self._umap_transform()
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scanpy/tools/_ingest.py", line 488, in _umap_transform
return self._umap.transform(self._obsm['rep'])
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/umap/umap_.py", line 3028, in transform
indices, dists = self._knn_search_index.query(
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/pynndescent/pynndescent_.py", line 1696, in query
query_data = np.asarray(query_data).astype(np.float32, order="C")
ValueError: setting an array element with a sequence.
>>>
>>> # Labels prediction
>>> sp.tl.transfer_labels_kNN(
... adata_query=sample,
... adata_ref=adata,
... ref_labels=["leiden", "cell_type_high_resolution"],
... )
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/tools.py", line 411, in transfer_labels_kNN
knn.fit(adata_ref.obsm[ref_basis], adata_ref.obs[ref_labels])
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/anndata/_core/aligned_mapping.py", line 196, in __getitem__
return self._data[key]
KeyError: 'X_pca_harmony'
>>>
@Flu09 I'm so sorry, could you please share a small subset of your data :(
And the versions of anndata and scanpy packages which you are using