Can't read re-segmented Xenium data
Hello,
I have re-run cell segmentation for my Xenium data using Xenium Ranger version 2.0.0.12 to benefit from its improved segmentation algorithm. I can't read the re-segmented data with spatialdata_io.xenium(), even though the original output can be read. I have spatialdata-io version 0.1.4.
Error message
sample1_v2 = spatialdata_io.xenium(sample1_loc)
INFO reading
/mnt/storage/vaquerizas/xenium/outputs/XR_0014511/00145
11/outs/cell_feature_matrix.h5
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[6], line 1
----> 1 sample1_v2 = spatialdata_io.xenium(sample1_loc)
File /mnt/storage/vaquerizas/liz/spatial/xenium/python_analysis/spatialdata_env/lib/python3.10/site-packages/spatialdata_io/_utils.py:46, in deprecation_alias.<locals>.deprecation_decorator.<locals>.wrapper(*args, **kwargs)
44 class_name = f.__qualname__
45 rename_kwargs(f.__name__, kwargs, aliases, class_name)
---> 46 return f(*args, **kwargs)
File /mnt/storage/vaquerizas/liz/spatial/xenium/python_analysis/spatialdata_env/lib/python3.10/site-packages/spatialdata_io/readers/xenium.py:227, in xenium(path, cells_boundaries, nucleus_boundaries, cells_as_circles, cells_labels, nucleus_labels, transcripts, morphology_mip, morphology_focus, aligned_images, cells_table, n_jobs, imread_kwargs, image_models_kwargs, labels_models_kwargs)
218 labels["nucleus_labels"], _ = _get_labels_and_indices_mapping(
219 path,
220 XeniumKeys.CELLS_ZARR,
(...)
224 labels_models_kwargs=labels_models_kwargs,
225 )
226 if cells_labels:
--> 227 labels["cell_labels"], cell_labels_indices_mapping = _get_labels_and_indices_mapping(
228 path,
229 XeniumKeys.CELLS_ZARR,
230 specs,
231 mask_index=1,
232 labels_name="cell_labels",
233 labels_models_kwargs=labels_models_kwargs,
234 )
235 if cell_labels_indices_mapping is not None and table is not None:
236 if not pd.DataFrame.equals(cell_labels_indices_mapping["cell_id"], table.obs[str(XeniumKeys.CELL_ID)]):
File /mnt/storage/vaquerizas/liz/spatial/xenium/python_analysis/spatialdata_env/lib/python3.10/site-packages/spatialdata_io/readers/xenium.py:446, in _get_labels_and_indices_mapping(path, file, specs, mask_index, labels_name, labels_models_kwargs)
443 real_label_index = real_label_index[1:]
445 if version < packaging.version.parse("2.0.0"):
--> 446 expected_label_index = z["seg_mask_value"][...]
448 if not np.array_equal(expected_label_index, real_label_index):
449 raise ValueError(
450 "The label indices from the labels differ from the ones from the input data. Please report "
451 f"this issue. Real label indices: {real_label_index}, expected label indices: "
452 f"{expected_label_index}."
453 )
File /mnt/storage/vaquerizas/liz/spatial/xenium/python_analysis/spatialdata_env/lib/python3.10/site-packages/zarr/hierarchy.py:511, in Group.__getitem__(self, item)
509 raise KeyError(item)
510 else:
--> 511 raise KeyError(item)
KeyError: 'seg_mask_value'
Is this related to #150 ? i.e. is Xenium Ranger reanalysis not supported?
Thanks in advance for your help with this.
Hi thanks for reporting. The code branch that is executed detects that the data is versioned with a number < 2.0.0. It could be that using XR 2.0.0.12 changes some specific data to the newest version but not the global data versioning.
I would suggest to manually parse the data in this case; you can see information and tutorials on how to proceed in the linked issue #150, and you could use the xenium.py code as a starting point.
Hi, thanks for the reply. I'll try adapting the code from spatialdata_io/readers/xenium.py to parse this format. I'll let you know if I figure out what's actually been changed in the resegmentation.
FWIW, I had a look at how the version is detected by spatialdata_io.readers.xenium, and it's technically being detected correctly, as this is what the relevant parts of my experiment.xenium file look like:
{
"major_version": 4,
[...snipped...]
"instrument_sw_version": "1.9.2.0",
"analysis_sw_version": "xenium-1.9.0.0",
[...snipped...]
},
"xenium_explorer_files": {
[...snipped...]
},
"xenium_ranger": {
"run_id": "0014511",
"version": "xenium-2.0.0.12",
"command_line": "xeniumranger resegment --id=0014511 --xenium-bundle=/mnt/scratch/egi12/xenium/output-XETG00207__0014511__Region_1__20240315__115210/ --jobmode=slurm --disable-ui=true"
},
"segmentation_stain": ""
}
Thanks for sharing the details. Maybe a fix could involve looking for xenium_ranger in the metadata and when the field is available, using this specific version for determining the code branch used by xenium() in spatialdata-io.
I managed to read in the data by adapting the code from spatialdata_io/readers/xenium.py to use the branches appropriate for version >= 2.0.0. So I think your idea to use the version in the xenium_ranger field in the metadata would work.
Caveats: 1) I didn't read in the transcripts as I don't need them for my current analysis 2) although I didn't get any errors/warnings, I don't know for sure that all the output is correct as I'm not super familiar with the data structures.
@liz-is ~~I have this same issue with xeniumranger 3.0.1 resegment. error output is the same. I am on spatialdata 0.2.5.post0.~~
I fixed it! It looks like a decent future-proof fix. It was reading in a new line from the specs, adding the XeniumKey.XENIUM_RANGER dict, which contains a key version in the same format as XeniumKeys.ANALYSIS_SW_VERSION, which parses nicely.... added XeniumKey.XENIUM_RANGER to _constants.py.
I just have to tidy up the if statement a little bit, something wasn't working correctly.
I will submit a pull request, some point soon (possibly this weekend), unless someone else does this.