spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Enhancement: Local SpatialData object with remotely accessed pyramidal OME-ZARR image

Open edoumazane opened this issue 1 year ago • 4 comments

WSIs can account for > 90% of a Spatial Transcriptomics experiment. This can limit the number of datasets i can host and open on my laptop or workstation

Describe the solution you'd like I'd like to leave the WSIs (saved as pyramidal ZARR files) on a S3 bucket or on a distant SSH server, and work with SpatialData "as if" the WSI was local.

Describe alternatives you've considered I can open a multiscale image in napari, overlay a label layer (with coarser pixels) for an annotation tasks. Zoom in and out is very smooth. Below is the code for display the image and 2 screenshots of the usecase.

import numpy as np
import zarr
import dask.array as da
import napari

def open_pyramidal_zarr_img(uri, v, layer_name=None, reorder_axes=False):
    store = zarr.open(uri, mode="r")
    multiscale_data = [da.from_zarr(store[key]) for key in store.array_keys()]
    if reorder_axes:

        if reorder_axes == "CYX -> XYC":
            multiscale_data = [da.swapaxes(data, 0, -1) for data in multiscale_data]

        else:
            raise ValueError(f"reorder_axes must be False or 'CYX -> XYC', got {reorder_axes}")
    v.add_image(multiscale_data, name=layer_name, rgb=True)
Image Image

(EDIT: scalebar is wrong, obviously 🙄)

Loading is fast (10-20s). Zooming in is smooth enough in my opinion for typical tasks.

It would be very nice to be able to do the same in the napari-spatialdata context.

What i tried

# open local sdata object
sdata = sd.read_zarr(local_path)

# add distant image
store = zarr.open(distant_uri, "r")
image = Image2DModel.parse(da.from_zarr(store["0"]), dims=('c', 'x', 'y'), c_coords=["r", "g", "b"])
sdata["full_res"] = image

# interactive display
interactive = Interactive(sdata)

It takes > 1 min, probably loading all image (then zooming is fast). Each time layer visibility is toggled, it takes another minute

edoumazane avatar Jan 15 '25 12:01 edoumazane

Hi, thanks for sharing! The problem here is that Image2DModel.parse() would need the scale_factors parameter in order to produce a multiscale image, but creating a multiscale image here is not an option when the data is remote, since there is the need to load the actual data in memory.

Here the approach would be to use read_zarr() to read directly from the cloud, but in your case this is also not an option since you want to read a single element (a single image) and not the whole object.

To enable this we can add support for reading a SpatialData object in a incremental way

I agree that this constitutes an important use case, so we will try to prioritize working on this. We need first to enable partial reading (which is a subtask of this https://github.com/scverse/spatialdata/issues/521), and then extend that for remote data also.

CC @berombau @melonora @aeisenbarth

LucaMarconato avatar Jan 15 '25 16:01 LucaMarconato

@edoumazane as a workaround you could do

interactive = Interactive(sdata)
interactive.viewer  # add WSI layers using napari APIs
interactive.run()

LucaMarconato avatar Jan 15 '25 16:01 LucaMarconato

Thank you @LucaMarconato the workaround works fine! Now I have to address the coordinates system and i will be good to go for my usecase.

# open local sdata and display in GUI
sdata = sd.read_zarr(local_path)
interactive = Interactive(sdata)

# open the distant pyramid
store = zarr.open(distant_uri, "r")
multiscale_data = [da.from_zarr(store[key]) for key in store.array_keys()]
multiscale_data = [da.swapaxes(data, 0, -1) for data in multiscale_data]

# add the distant pyramid to the GUI
interactive._viewer.add_image(multiscale_data, name="full_res", rgb=True)

Image

edoumazane avatar Jan 15 '25 18:01 edoumazane

Happy to hear that the workaround works for you! For fixing the coordinate transformations I suggest to call get_transformation_between_coordinate_systems() followed by a call of [BaseTransformation.to_affine_matrix()](https://github.com/scverse/spatialdata/blob/ae71ae134de2c189a3aa425ceabe82cf1937e701/src/spatialdata/transformations/transformations.py#L125). For napari you need to set the input and output axes as ('y', 'x'), as shown here https://github.com/scverse/napari-spatialdata/blob/a7b5faa7252fab7ced9b71e40cb65e3dfb3d6c29/src/napari_spatialdata/utils/_utils.py#L192.

LucaMarconato avatar Jan 20 '25 14:01 LucaMarconato