`read_10x_mtx()` cannot handle numerical barcodes properly (BD Rhapsody specifically)
Please make sure these conditions are met
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of scanpy.
- [ ] (optional) I have confirmed this bug exists on the main branch of scanpy.
What happened?
Hello, I am trying to load BD Rhapsody data using read_10x_mtx(), but it seems that the function does not handle the numerical barcodes correctly. I checked the barcodes.tsv.gz file, and it contains numerical barcodes (integers) instead of the expected ACGT sequences. This is causing issues when I try to load the data into an AnnData object.
According to the BD Rhapsody documentation, they use numerical cell IDs to distinguish between cells, which is different from the standard 10X Genomics format that uses string barcodes.
Example data can be download from the BD Rhapsody website here, here I attached a small example of their GEX matrix file: BD-Demo-WTA-SMK_SampleTag03_hs_RSEC_MolsPerCell_MEX.zip.
Minimal code sample
>>> import scanpy as sc
>>> adata = sc.read_10x_mtx('data/SMK_SampleTag03')
Error output
<CONDA_PREFIX>/lib/python3.12/site-packages/anndata/_core/anndata.py:812: UserWarning:
AnnData expects .obs.index to contain strings, but got values like:
[9265, 11954, 21560, 31507, 32668]
Inferred to be: integer
names = self._prep_dim_index(names, "obs")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<CONDA_PREFIX>/lib/python3.12/site-packages/legacy_api_wrap/__init__.py", line 82, in fn_compatible
return fn(*args_all, **kw)
^^^^^^^^^^^^^^^^^^^
File "<CONDA_PREFIX>/lib/python3.12/site-packages/scanpy/readwrite.py", line 597, in read_10x_mtx
return adata[:, gex_rows].copy()
~~~~~^^^^^^^^^^^^^
File "<CONDA_PREFIX>/lib/python3.12/site-packages/anndata/_core/anndata.py", line 1011, in __getitem__
oidx, vidx = self._normalize_indices(index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<CONDA_PREFIX>/lib/python3.12/site-packages/anndata/_core/anndata.py", line 992, in _normalize_indices
return _normalize_indices(index, self.obs_names, self.var_names)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<CONDA_PREFIX>/lib/python3.12/site-packages/anndata/_core/index.py", line 32, in _normalize_indices
ax0 = _normalize_index(ax0, names0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<CONDA_PREFIX>/lib/python3.12/site-packages/anndata/_core/index.py", line 50, in _normalize_index
assert index.dtype != int, msg
^^^^^^^^^^^^^^^^^^
AssertionError: Don’t call _normalize_index with non-categorical/string names
Versions
scanpy 1.11.1
---- ----
h5py 3.13.0
setuptools 80.1.0
colorama 0.4.6
session-info2 0.1.2
python-dateutil 2.9.0.post0
packaging 25.0
scikit-learn 1.5.2
numpy 2.2.5
typing_extensions 4.13.2
legacy-api-wrap 1.4.1
numba 0.61.2
llvmlite 0.44.0
six 1.17.0
matplotlib 3.10.1
joblib 1.5.0
pyparsing 3.2.3
cycler 0.12.1
pandas 2.2.3
pytz 2025.2
scipy 1.15.2
natsort 8.4.0
threadpoolctl 3.6.0
kiwisolver 1.4.8
anndata 0.11.4
pillow 11.1.0
---- ----
Python 3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:21:13) [GCC 13.3.0]
OS Linux-4.18.0-348.el8.x86_64-x86_64-with-glibc2.28
CPU 64 logical CPU cores, x86_64
GPU No GPU found
Updated <SCRUBBED>
I discovered that you can get around this by setting gex_only to False when calling read_10x_mtx():
adata = sc.read_10x_mtx(path, gex_only = False)
However, the warning message persists:
<SCRUBBED>/lib/python3.12/site-packages/anndata/_core/anndata.py:812: UserWarning:
AnnData expects .obs.index to contain strings, but got values like:
[2534, 5269, 5661, 8881, 9730]
Inferred to be: integer
names = self._prep_dim_index(names, "obs")
I haven't test any compatibility issues regarding numerical barcodes in other functions, but I suspect that this issue might be more widespread. I would appreciate any help or suggestions on how to handle such situation.