spatialdata-io Can't read re-segmented Xenium data

Hello,

I have re-run cell segmentation for my Xenium data using Xenium Ranger version 2.0.0.12 to benefit from its improved segmentation algorithm. I can't read the re-segmented data with spatialdata_io.xenium(), even though the original output can be read. I have spatialdata-io version 0.1.4.

Error message

sample1_v2 = spatialdata_io.xenium(sample1_loc)

INFO     reading                                                                                                   
         /mnt/storage/vaquerizas/xenium/outputs/XR_0014511/00145
         11/outs/cell_feature_matrix.h5                                                                            

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[6], line 1
----> 1 sample1_v2 = spatialdata_io.xenium(sample1_loc)

File /mnt/storage/vaquerizas/liz/spatial/xenium/python_analysis/spatialdata_env/lib/python3.10/site-packages/spatialdata_io/_utils.py:46, in deprecation_alias.<locals>.deprecation_decorator.<locals>.wrapper(*args, **kwargs)
     44 class_name = f.__qualname__
     45 rename_kwargs(f.__name__, kwargs, aliases, class_name)
---> 46 return f(*args, **kwargs)

File /mnt/storage/vaquerizas/liz/spatial/xenium/python_analysis/spatialdata_env/lib/python3.10/site-packages/spatialdata_io/readers/xenium.py:227, in xenium(path, cells_boundaries, nucleus_boundaries, cells_as_circles, cells_labels, nucleus_labels, transcripts, morphology_mip, morphology_focus, aligned_images, cells_table, n_jobs, imread_kwargs, image_models_kwargs, labels_models_kwargs)
    218     labels["nucleus_labels"], _ = _get_labels_and_indices_mapping(
    219         path,
    220         XeniumKeys.CELLS_ZARR,
   (...)
    224         labels_models_kwargs=labels_models_kwargs,
    225     )
    226 if cells_labels:
--> 227     labels["cell_labels"], cell_labels_indices_mapping = _get_labels_and_indices_mapping(
    228         path,
    229         XeniumKeys.CELLS_ZARR,
    230         specs,
    231         mask_index=1,
    232         labels_name="cell_labels",
    233         labels_models_kwargs=labels_models_kwargs,
    234     )
    235     if cell_labels_indices_mapping is not None and table is not None:
    236         if not pd.DataFrame.equals(cell_labels_indices_mapping["cell_id"], table.obs[str(XeniumKeys.CELL_ID)]):

File /mnt/storage/vaquerizas/liz/spatial/xenium/python_analysis/spatialdata_env/lib/python3.10/site-packages/spatialdata_io/readers/xenium.py:446, in _get_labels_and_indices_mapping(path, file, specs, mask_index, labels_name, labels_models_kwargs)
    443     real_label_index = real_label_index[1:]
    445 if version < packaging.version.parse("2.0.0"):
--> 446     expected_label_index = z["seg_mask_value"][...]
    448     if not np.array_equal(expected_label_index, real_label_index):
    449         raise ValueError(
    450             "The label indices from the labels differ from the ones from the input data. Please report "
    451             f"this issue. Real label indices: {real_label_index}, expected label indices: "
    452             f"{expected_label_index}."
    453         )

File /mnt/storage/vaquerizas/liz/spatial/xenium/python_analysis/spatialdata_env/lib/python3.10/site-packages/zarr/hierarchy.py:511, in Group.__getitem__(self, item)
    509         raise KeyError(item)
    510 else:
--> 511     raise KeyError(item)

KeyError: 'seg_mask_value'

Is this related to #150 ? i.e. is Xenium Ranger reanalysis not supported?

Thanks in advance for your help with this.

Aug 11 '24 12:08 liz-is

Hi thanks for reporting. The code branch that is executed detects that the data is versioned with a number < 2.0.0. It could be that using XR 2.0.0.12 changes some specific data to the newest version but not the global data versioning.

I would suggest to manually parse the data in this case; you can see information and tutorials on how to proceed in the linked issue #150, and you could use the xenium.py code as a starting point.

Aug 12 '24 12:08 LucaMarconato

Hi, thanks for the reply. I'll try adapting the code from spatialdata_io/readers/xenium.py to parse this format. I'll let you know if I figure out what's actually been changed in the resegmentation.

FWIW, I had a look at how the version is detected by spatialdata_io.readers.xenium, and it's technically being detected correctly, as this is what the relevant parts of my experiment.xenium file look like:

{
    "major_version": 4,
    [...snipped...]
    "instrument_sw_version": "1.9.2.0",
    "analysis_sw_version": "xenium-1.9.0.0",
     [...snipped...]
    },
    "xenium_explorer_files": {
    [...snipped...]
    },
    "xenium_ranger": {
        "run_id": "0014511",
        "version": "xenium-2.0.0.12",
        "command_line": "xeniumranger resegment --id=0014511 --xenium-bundle=/mnt/scratch/egi12/xenium/output-XETG00207__0014511__Region_1__20240315__115210/ --jobmode=slurm --disable-ui=true"
    },
    "segmentation_stain": ""
}

Aug 12 '24 22:08 liz-is

Thanks for sharing the details. Maybe a fix could involve looking for xenium_ranger in the metadata and when the field is available, using this specific version for determining the code branch used by xenium() in spatialdata-io.

Aug 14 '24 16:08 LucaMarconato

I managed to read in the data by adapting the code from spatialdata_io/readers/xenium.py to use the branches appropriate for version >= 2.0.0. So I think your idea to use the version in the xenium_ranger field in the metadata would work.

Caveats: 1) I didn't read in the transcripts as I don't need them for my current analysis 2) although I didn't get any errors/warnings, I don't know for sure that all the output is correct as I'm not super familiar with the data structures.

Aug 18 '24 15:08 liz-is

@liz-is ~~I have this same issue with xeniumranger 3.0.1 resegment. error output is the same. I am on spatialdata 0.2.5.post0.~~

I fixed it! It looks like a decent future-proof fix. It was reading in a new line from the specs, adding the XeniumKey.XENIUM_RANGER dict, which contains a key version in the same format as XeniumKeys.ANALYSIS_SW_VERSION, which parses nicely.... added XeniumKey.XENIUM_RANGER to _constants.py.

I just have to tidy up the if statement a little bit, something wasn't working correctly.

I will submit a pull request, some point soon (possibly this weekend), unless someone else does this.

Nov 29 '24 13:11 Pancreas-Pratik