Reader for seqFISH data
Add reader for seqFISH data.
Codecov Report
Merging #53 (9f41cbe) into main (755d475) will increase coverage by
0.68%. Report is 22 commits behind head on main. The diff coverage is44.66%.
Additional details and impacted files
@@ Coverage Diff @@
## main #53 +/- ##
==========================================
+ Coverage 41.92% 42.60% +0.68%
==========================================
Files 16 17 +1
Lines 854 946 +92
==========================================
+ Hits 358 403 +45
- Misses 496 543 +47
| Files Changed | Coverage Δ | |
|---|---|---|
| src/spatialdata_io/readers/merscope.py | 25.00% <8.33%> (-2.03%) |
:arrow_down: |
| src/spatialdata_io/readers/seqfish.py | 33.96% <33.96%> (ø) |
|
| src/spatialdata_io/__init__.py | 100.00% <100.00%> (ø) |
|
| src/spatialdata_io/_constants/_constants.py | 100.00% <100.00%> (ø) |
remember to add it to docs/api.md and also add codex there ( I think it was missing from #34 ). Also please delete the "coming soon" section in the index.md . You can delete this whole thing
EDIT: and same thing to the README.md of the repo, please check that the index.md and the readme.md always correspond, thank you!
It seems impossible for me to download that 85 GB monster of seqFISh data to actually validate that reader, @giovp could you maybe try it out?
@LLehner please test with napari before merging. I suggest to write to Zarr and read again the sdata object in case in which the performance are bad (since the first lazy representation is reading from a non-performant disk storage, while after you save and read Zarr and Parquet are used).
@LucaMarconato seems to work, here is an example of points visualization:
sdata = seqfish(path=path)
interactive = Interactive(sdata)
interactive.run()
then selecting global>transcripts_1 results in:
Great work @LLehner!
A few comments:
- As Liang Ding reported, there was a bug in the indexing, I fixed it (you were using the last value of
enumerate()which was giving len - 1). - I noticed that
ImageModel.parse()was not usingscale_factorseven if the images are big. This leads to poor performance because it doesn't compute intermediate resolutions. I fixed this in my last commit. - Similar to this, no chunking was used for the images (in this case the images are so big that it was not possible saving them to disk). You can see this in the screenshot below.
I will make this point clearer in the docs/notebooks because it is not obvious that one should pay attention to the chunk shape/size of the data by using
chunksparameter in the parser. Please notice that when usingscale_factors, the chunking is automatically computed to get an efficient representation (also this is not obvious from the docs).
- I think that in the napari screenshot you showed you were didn't save the data to disk first. To harness the chunked and multiscale representation of the data one needs first to save to Zarr (
sdata.write()) and then read again. Otherwise the visualization of the images would be very inefficient and napari would hang. I will update the docs/notebooks to make this clearer as it is not straightforward.
I will now try visualizing the data with napari.
@LLehner I downloaded the data on my machine and used napari to view it. There are some bugs that I ask you to fix please, but it's almost there, they are all very quick to address.
- [x] the key names for images and labels are swapped
- [x] the "cells" (shapes) should be also per section. So
cells_1,cells_2,cells_3, instead of one element will all the cells. Otherwise when they are plot they overlap in space. - [x] the cells should have a radius that is deduced from the area, not a constant radius. We do this also for
xenium(), you can copy the code from there - [x] finally, here we should not have a unique coordinate system called global, but we should have three coordinate systems, one per sample/fov. Please check the
mibitof/to_zarr.pyfile in the sandbox, orcosmx()and I think alsosteinbock()to see how to do that.
I also noticed that
- [x] the gene expression and
obsfrom the table can't be plotted
Not sure why, I get a two different weird errors, one for obs and one for expression. When the rest is addressed we should look into this. To do this I usually run napari/vscode from PyCharm and go with breakpoints. I can also check into this if you want, please keep me posted.
is this still planned to be included? @LucaMarconato @LLehner ?
Yes, it's in the todo list; still didn't have time to test.
@LLehner I addressed all the task items from this conversation that were still open. I have also added a converter script in the sandbox and tested the reader with napari-spatialdata. Finally, I added extra arguments to be able to parse a subset of the elements and of the sections (useful when debugging).
Thanks again for the work on this PR, ready to merge now (the failing tests are due to the fact that we need to make a new release in spatialdata and are unrelated to this PR).