spatialdata-io icon indicating copy to clipboard operation
spatialdata-io copied to clipboard

Reader for seqFISH data

Open LLehner opened this issue 2 years ago • 10 comments

Add reader for seqFISH data.

LLehner avatar Jun 19 '23 15:06 LLehner

Codecov Report

Merging #53 (9f41cbe) into main (755d475) will increase coverage by 0.68%. Report is 22 commits behind head on main. The diff coverage is 44.66%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #53      +/-   ##
==========================================
+ Coverage   41.92%   42.60%   +0.68%     
==========================================
  Files          16       17       +1     
  Lines         854      946      +92     
==========================================
+ Hits          358      403      +45     
- Misses        496      543      +47     
Files Changed Coverage Δ
src/spatialdata_io/readers/merscope.py 25.00% <8.33%> (-2.03%) :arrow_down:
src/spatialdata_io/readers/seqfish.py 33.96% <33.96%> (ø)
src/spatialdata_io/__init__.py 100.00% <100.00%> (ø)
src/spatialdata_io/_constants/_constants.py 100.00% <100.00%> (ø)

codecov-commenter avatar Jun 19 '23 15:06 codecov-commenter

remember to add it to docs/api.md and also add codex there ( I think it was missing from #34 ). Also please delete the "coming soon" section in the index.md . You can delete this whole thing

image

EDIT: and same thing to the README.md of the repo, please check that the index.md and the readme.md always correspond, thank you!

giovp avatar Jun 20 '23 17:06 giovp

It seems impossible for me to download that 85 GB monster of seqFISh data to actually validate that reader, @giovp could you maybe try it out?

timtreis avatar Jun 20 '23 23:06 timtreis

@LLehner please test with napari before merging. I suggest to write to Zarr and read again the sdata object in case in which the performance are bad (since the first lazy representation is reading from a non-performant disk storage, while after you save and read Zarr and Parquet are used).

LucaMarconato avatar Jun 21 '23 17:06 LucaMarconato

@LucaMarconato seems to work, here is an example of points visualization:

sdata = seqfish(path=path)
interactive = Interactive(sdata)
interactive.run()

then selecting global>transcripts_1 results in:

image

LLehner avatar Jun 22 '23 10:06 LLehner

Great work @LLehner!

A few comments:

  • As Liang Ding reported, there was a bug in the indexing, I fixed it (you were using the last value of enumerate() which was giving len - 1).
  • I noticed that ImageModel.parse() was not using scale_factors even if the images are big. This leads to poor performance because it doesn't compute intermediate resolutions. I fixed this in my last commit.
  • Similar to this, no chunking was used for the images (in this case the images are so big that it was not possible saving them to disk). You can see this in the screenshot below. I will make this point clearer in the docs/notebooks because it is not obvious that one should pay attention to the chunk shape/size of the data by using chunks parameter in the parser. Please notice that when using scale_factors, the chunking is automatically computed to get an efficient representation (also this is not obvious from the docs).
image
  • I think that in the napari screenshot you showed you were didn't save the data to disk first. To harness the chunked and multiscale representation of the data one needs first to save to Zarr (sdata.write()) and then read again. Otherwise the visualization of the images would be very inefficient and napari would hang. I will update the docs/notebooks to make this clearer as it is not straightforward.

I will now try visualizing the data with napari.

LucaMarconato avatar Jul 15 '23 13:07 LucaMarconato

@LLehner I downloaded the data on my machine and used napari to view it. There are some bugs that I ask you to fix please, but it's almost there, they are all very quick to address.

  • [x] the key names for images and labels are swapped
  • [x] the "cells" (shapes) should be also per section. So cells_1, cells_2, cells_3, instead of one element will all the cells. Otherwise when they are plot they overlap in space.
  • [x] the cells should have a radius that is deduced from the area, not a constant radius. We do this also for xenium(), you can copy the code from there
  • [x] finally, here we should not have a unique coordinate system called global, but we should have three coordinate systems, one per sample/fov. Please check the mibitof/to_zarr.py file in the sandbox, or cosmx() and I think also steinbock() to see how to do that.

LucaMarconato avatar Jul 18 '23 18:07 LucaMarconato

I also noticed that

  • [x] the gene expression and obs from the table can't be plotted

Not sure why, I get a two different weird errors, one for obs and one for expression. When the rest is addressed we should look into this. To do this I usually run napari/vscode from PyCharm and go with breakpoints. I can also check into this if you want, please keep me posted.

LucaMarconato avatar Jul 18 '23 18:07 LucaMarconato

is this still planned to be included? @LucaMarconato @LLehner ?

giovp avatar Feb 05 '24 17:02 giovp

Yes, it's in the todo list; still didn't have time to test.

LucaMarconato avatar Feb 05 '24 21:02 LucaMarconato

@LLehner I addressed all the task items from this conversation that were still open. I have also added a converter script in the sandbox and tested the reader with napari-spatialdata. Finally, I added extra arguments to be able to parse a subset of the elements and of the sections (useful when debugging).

Thanks again for the work on this PR, ready to merge now (the failing tests are due to the fact that we need to make a new release in spatialdata and are unrelated to this PR).

LucaMarconato avatar Jun 16 '24 15:06 LucaMarconato