bug loading vizgen data
Description
Trouble using vizgen data with sq.read.vizgen(). Function is expected 8 columns in metadata file mine has 9
Minimal reproducible example
adata = sq.read.vizgen(
path=data_path,
counts_file=os.path.join(data_path,section,file_path,cbg_file),
meta_file=os.path.join(data_path,section,file_path,meta_file),
transformation_file=os.path.join(data_path,section,'region_0/images/micron_to_mosaic_pixel_transform.csv'),
)
Traceback
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [54], in <cell line: 1>()
----> 1 adata = sq.read.vizgen(
2 path=data_path,
3 counts_file=os.path.join(data_path,section,file_path,cbg_file),
4 meta_file=os.path.join(data_path,section,file_path,meta_file),
5 transformation_file=os.path.join(data_path,section,'region_0/images/micron_to_mosaic_pixel_transform.csv'),
6 )
File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/squidpy/read/_read.py:146, in vizgen(path, counts_file, meta_file, transformation_file, library_id, **kwargs)
144 # fmt: off
145 coords = pd.read_csv(path / meta_file, header=0, index_col=0)
--> 146 coords.columns = ["fov", "volume", "center_x", "center_y", "min_x", "max_x", "min_y", "max_y"]
147 # fmt: on
149 adata.obs = pd.merge(adata.obs, coords, how="left", left_index=True, right_index=True)
File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/generic.py:5588, in NDFrame.__setattr__(self, name, value)
5586 try:
5587 object.__getattribute__(self, name)
-> 5588 return object.__setattr__(self, name, value)
5589 except AttributeError:
5590 pass
File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/_libs/properties.pyx:70, in pandas._libs.properties.AxisProperty.__set__()
File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/generic.py:769, in NDFrame._set_axis(self, axis, labels)
767 def _set_axis(self, axis: int, labels: Index) -> None:
768 labels = ensure_index(labels)
--> 769 self._mgr.set_axis(axis, labels)
770 self._clear_item_cache()
File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/internals/managers.py:214, in BaseBlockManager.set_axis(self, axis, new_labels)
212 def set_axis(self, axis: int, new_labels: Index) -> None:
213 # Caller is responsible for ensuring we have an Index object.
--> 214 self._validate_set_axis(axis, new_labels)
215 self.axes[axis] = new_labels
File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/internals/base.py:69, in DataManager._validate_set_axis(self, axis, new_labels)
66 pass
68 elif new_len != old_len:
---> 69 raise ValueError(
70 f"Length mismatch: Expected axis has {old_len} elements, new "
71 f"values have {new_len} elements"
72 )
ValueError: Length mismatch: Expected axis has 9 elements, new values have 8 elements
Version
'1.2.2'
...
I'm experiencing something similar except it's 16 elements rather than 9?
ValueError: Length mismatch: Expected axis has 16 elements, new values have 8 elements
Hi @mkunst23 and @andrewjkwok , this should've been fixed in #648, installing squidpy from main should fix this.
@michalk8 Thanks for pointing us in the right direction - it works correctly now!
Sorry it seems it isn't entirely working yet. When I read the data in, my obs dataframe gets all NaNs, but if I check my cell_metadata.csv file, it's looks populated with the various cell coordinates etc. to me. The result is that I have no spatial coordinates to plot.
A second quick thing is that previous merscope outputs gave the cell coordinates in a set of hdf5 files, but that has since the merscope software update to v232 and onwards becomes a single parquet file instead - does squidpy need this info at all? I can't seem to find anywhere in the squidpy documentation that uses this file and am wondering if that would help the issue of the lack of spatial coordinates.
hey @andrewjkwok ,
thanks for reporting this, I'm afraid it's a bit tricky to help out without the data available. Could you share the data download so we can test it out? thanks!
@giovp Yes very happy to. Is there an email I could share a google drive link to? Many thanks in advance.
Sorry just a quick follow up @giovp @michalk8 was wondering if there was somewhere to share my data with your team to take a look?
@andrewjkwok any chance you could point us to some public data? for example, some data shared by vizgen?
@giovp hmm the problem is that the cell metadata file from my MERSCOPE output (running their latest v232 software) doesn't look the same as the ones that are on vizgen's website.
So if I go to the squidpy website and follow the tutorial (https://squidpy.readthedocs.io/en/stable/external_tutorials/tutorial_vizgen.html) for the data download (https://info.vizgen.com/mouse-brain-map?submissionGuid=a66ccb7f-87cf-4c55-83b9-5a2b6c0c12b9), the cell_metadata.csv file doesn't look the same as the one from my merscope.
I've attached a truncated version of my cell metadata file for reference.
Vizgen website data: datasets_mouse_brain_map_BrainReceptorShowcase_Slice1_Replicate1_cell_metadata_S1R1.csv
Output from my merscope: cell_metadata_truncated.csv
hi @andrewjkwok I am unfortunately unable to look into this in the next two weeks, thanks for sharing the data, I'll get back to you soon
Hi - just wanted to quickly check whether there was any progress with this?
There was an issue with indexing but installing squidpy from main should fix the metadata not populating.
The spatial coordinates are being populated by the center_x and center_y from the metadata. The sq.read.vizgen function doesn't use the cell segmentation output, either the older hdf5 or the newer parquet formats.
hi all, I would suggest to take a look at the https://github.com/scverse/spatialdata-io package to read in spatial omics data. We won't be maintaining the IO reading functions here but update specifications from the commercial platforms only in spatialdata-io