squidpy icon indicating copy to clipboard operation
squidpy copied to clipboard

bug loading vizgen data

Open mkunst23 opened this issue 2 years ago • 13 comments

Description

Trouble using vizgen data with sq.read.vizgen(). Function is expected 8 columns in metadata file mine has 9

Minimal reproducible example

adata = sq.read.vizgen(
    path=data_path,
    counts_file=os.path.join(data_path,section,file_path,cbg_file),
    meta_file=os.path.join(data_path,section,file_path,meta_file),
    transformation_file=os.path.join(data_path,section,'region_0/images/micron_to_mosaic_pixel_transform.csv'),
)

Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [54], in <cell line: 1>()
----> 1 adata = sq.read.vizgen(
      2     path=data_path,
      3     counts_file=os.path.join(data_path,section,file_path,cbg_file),
      4     meta_file=os.path.join(data_path,section,file_path,meta_file),
      5     transformation_file=os.path.join(data_path,section,'region_0/images/micron_to_mosaic_pixel_transform.csv'),
      6 )

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/squidpy/read/_read.py:146, in vizgen(path, counts_file, meta_file, transformation_file, library_id, **kwargs)
    144 # fmt: off
    145 coords = pd.read_csv(path / meta_file, header=0, index_col=0)
--> 146 coords.columns = ["fov", "volume", "center_x", "center_y", "min_x", "max_x", "min_y", "max_y"]
    147 # fmt: on
    149 adata.obs = pd.merge(adata.obs, coords, how="left", left_index=True, right_index=True)

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/generic.py:5588, in NDFrame.__setattr__(self, name, value)
   5586 try:
   5587     object.__getattribute__(self, name)
-> 5588     return object.__setattr__(self, name, value)
   5589 except AttributeError:
   5590     pass

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/_libs/properties.pyx:70, in pandas._libs.properties.AxisProperty.__set__()

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/generic.py:769, in NDFrame._set_axis(self, axis, labels)
    767 def _set_axis(self, axis: int, labels: Index) -> None:
    768     labels = ensure_index(labels)
--> 769     self._mgr.set_axis(axis, labels)
    770     self._clear_item_cache()

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/internals/managers.py:214, in BaseBlockManager.set_axis(self, axis, new_labels)
    212 def set_axis(self, axis: int, new_labels: Index) -> None:
    213     # Caller is responsible for ensuring we have an Index object.
--> 214     self._validate_set_axis(axis, new_labels)
    215     self.axes[axis] = new_labels

File /allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/michaelkunst/miniconda3/envs/scvi-tool/lib/python3.9/site-packages/pandas/core/internals/base.py:69, in DataManager._validate_set_axis(self, axis, new_labels)
     66     pass
     68 elif new_len != old_len:
---> 69     raise ValueError(
     70         f"Length mismatch: Expected axis has {old_len} elements, new "
     71         f"values have {new_len} elements"
     72     )

ValueError: Length mismatch: Expected axis has 9 elements, new values have 8 elements

Version

'1.2.2'

...

mkunst23 avatar Mar 24 '23 22:03 mkunst23

I'm experiencing something similar except it's 16 elements rather than 9?

ValueError: Length mismatch: Expected axis has 16 elements, new values have 8 elements

andrewjkwok avatar Mar 31 '23 07:03 andrewjkwok

Hi @mkunst23 and @andrewjkwok , this should've been fixed in #648, installing squidpy from main should fix this.

michalk8 avatar Mar 31 '23 08:03 michalk8

@michalk8 Thanks for pointing us in the right direction - it works correctly now!

andrewjkwok avatar Apr 03 '23 01:04 andrewjkwok

Sorry it seems it isn't entirely working yet. When I read the data in, my obs dataframe gets all NaNs, but if I check my cell_metadata.csv file, it's looks populated with the various cell coordinates etc. to me. The result is that I have no spatial coordinates to plot.

A second quick thing is that previous merscope outputs gave the cell coordinates in a set of hdf5 files, but that has since the merscope software update to v232 and onwards becomes a single parquet file instead - does squidpy need this info at all? I can't seem to find anywhere in the squidpy documentation that uses this file and am wondering if that would help the issue of the lack of spatial coordinates.

andrewjkwok avatar Apr 03 '23 11:04 andrewjkwok

hey @andrewjkwok ,

thanks for reporting this, I'm afraid it's a bit tricky to help out without the data available. Could you share the data download so we can test it out? thanks!

giovp avatar Apr 03 '23 12:04 giovp

@giovp Yes very happy to. Is there an email I could share a google drive link to? Many thanks in advance.

andrewjkwok avatar Apr 04 '23 02:04 andrewjkwok

Sorry just a quick follow up @giovp @michalk8 was wondering if there was somewhere to share my data with your team to take a look?

andrewjkwok avatar Apr 05 '23 23:04 andrewjkwok

@andrewjkwok any chance you could point us to some public data? for example, some data shared by vizgen?

giovp avatar Apr 11 '23 09:04 giovp

@giovp hmm the problem is that the cell metadata file from my MERSCOPE output (running their latest v232 software) doesn't look the same as the ones that are on vizgen's website.

So if I go to the squidpy website and follow the tutorial (https://squidpy.readthedocs.io/en/stable/external_tutorials/tutorial_vizgen.html) for the data download (https://info.vizgen.com/mouse-brain-map?submissionGuid=a66ccb7f-87cf-4c55-83b9-5a2b6c0c12b9), the cell_metadata.csv file doesn't look the same as the one from my merscope.

I've attached a truncated version of my cell metadata file for reference.

Vizgen website data: datasets_mouse_brain_map_BrainReceptorShowcase_Slice1_Replicate1_cell_metadata_S1R1.csv

Output from my merscope: cell_metadata_truncated.csv

andrewjkwok avatar Apr 11 '23 09:04 andrewjkwok

hi @andrewjkwok I am unfortunately unable to look into this in the next two weeks, thanks for sharing the data, I'll get back to you soon

giovp avatar Apr 12 '23 19:04 giovp

Hi - just wanted to quickly check whether there was any progress with this?

andrewjkwok avatar May 24 '23 07:05 andrewjkwok

There was an issue with indexing but installing squidpy from main should fix the metadata not populating.

The spatial coordinates are being populated by the center_x and center_y from the metadata. The sq.read.vizgen function doesn't use the cell segmentation output, either the older hdf5 or the newer parquet formats.

dfhannum avatar Jun 09 '23 20:06 dfhannum

hi all, I would suggest to take a look at the https://github.com/scverse/spatialdata-io package to read in spatial omics data. We won't be maintaining the IO reading functions here but update specifications from the commercial platforms only in spatialdata-io

giovp avatar Jul 07 '24 13:07 giovp