Xenium Ranger 3.0.1.1 after Cellpose resegmentation - "ValueError: A linearring requires at least 4 coordinates."
After segmentation with cellpose, I ran Xenium Ranger 3.0.1.1
xeniumranger import-segmentation --id=my_ID \ --xenium-bundle=$DIR/outs \ --nuclei=$DIR/outs/mask_nuclei.ome_cp_masks.tif \ --cells=$DIR/outs/mask_cells.ome_cp_masks.tif
When I try to read the output folder with spatialdata_io (0.1.7.dev5+g82ff327), spatialdata (0.2.7.dev7+gd746485)
sdata = spatialdata_io.xenium(dir)
I get this error:
"ValueError: A linearring requires at least 4 coordinates."
Is there a temporary workaround to manually create the SpatialData data framework from Xenium ranger output? (I mean a tutorial/notebook/...)
Nobody have an idea about how to resolve this issue? Does anybody knows how to download the previous working version of Xenium Ranger (v3.0.0.0)? Or any tutorial to manually create the spatial data object from Xenium Ranger output?
After updating to Xenium Ranger v3.1, still not working. Nobody has any idea how to resolve this issue?
`Traceback (most recent call last):
File "./Read_Write_zarr.py", line 11, in
File "./lib/python3.10/site-packages/spatialdata_io/_utils.py", line 46, in wrapper return f(*args, **kwargs)
File "./lib/python3.10/site-packages/spatialdata_io/readers/xenium.py", line 232, in xenium polygons["cell_boundaries"] = _get_polygons(
File "./lib/python3.10/site-packages/spatialdata_io/readers/xenium.py", line 349, in _get_polygons out = Parallel(n_jobs=n_jobs)(
File "./lib/python3.10/site-packages/joblib/parallel.py", line 1918, in call return output if self.return_generator else list(output)
File "./lib/python3.10/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output res = func(*args, **kwargs)
File "./lib/python3.10/site-packages/spatialdata_io/readers/xenium.py", line 341, in _poly return Polygon(arr[:-1])
File "./lib/python3.10/site-packages/shapely/geometry/polygon.py", line 230, in new shell = LinearRing(shell)
File "./lib/python3.10/site-packages/shapely/geometry/polygon.py", line 104, in new geom = shapely.linearrings(coordinates)
File "./lib/python3.10/site-packages/shapely/decorators.py", line 77, in wrapper return func(*args, **kwargs)
File "./lib/python3.10/site-packages/shapely/creation.py", line 173, in linearrings return lib.linearrings(coords, out=out, **kwargs)
ValueError: A linearring requires at least 4 coordinates.`
Hi, You should inspect your cellpose segmentation and make sure you do not have polygons with less than 4 vertices. If you want to load this Cellpose segmentation anyway, you can remove all cell polygons with less than 4 vertices: (Usually this represent a very low number of cells, ~ 0.1%)
cell_boundaries = pd.read_parquet("./cell_boundaries.parquet")
cells_to_remove = cell_boundaries.groupby("cell_id").filter(lambda x: len(x) < 4).cell_id.unique()
cell_boundaries = cell_boundaries[~cell_boundaries.cell_id.isin(cells_to_remove)]
cell_boundaries.to_parquet("./cell_boundaries.parquet")
If you do you, you should certainly also modify the transcripts file and the count matrix; be careful not to modify your raw data, it can be a bit dangerous to do this !
Thanks @Lem-P for reporting and thanks @ConstensouxAlexis for sharing a workaround. @Lem-P does this solve your issue?
If not, I'll be happy to have a look into this, but please I ask you to share a reproducible script running on some public data to be making it easier to inspect this. Thank you.
I think this is issue is caused by Xeniumranger import-segmentation and not by spatialdata. When I run xeniumranger with a custom segmentation (Baysor or CellPose), even if I make sure all my segmented cells can be casted to Polygon and have more than 4 vertices, Xeniumranger will create some cells with less than 4 vertices. I haven't figured out yet what could be the problem !
Hi, I am back on this project! I have indeed polygons with less than 4 vertices in the cell_boundaries.parquet file and could filter them out of the cell boundaries file. Should I then use the cell ID from cells_to_remove to filter out those ID from the 'transcripts.parquet' file? What about the 'cells.parquet' and 'nucleus_boundaries.parquet' files? Also, how do I modify 'cell_feature_matrix.h5'? (Sorry for the noob question)
In line with @ConstensouxAlexis last comment, as it seems it is Xenium Ranger that introduces these problematic polygons, is there a (easy) way to generate the input files for SpatialData-io without using Xenium Ranger?
I think that it's fine just to remove problematic polygons; this will only impact the vizualisation and not the analysis
Thank you, removing the problematic polygons only in the cell_boundaries.parquet file allowed me to make a spatialData object. I was afraid that a discrepancy in the cell ID between files would generate issues
Hey @Lem-P and @ConstensouxAlexis, I linked a PR to this issue which should fix the issue. Could you verify with your data?
@LucaMarconato I'm filtering out the polygons and remove the IDs from the table - am I missing something?
Hello @timtreis, thank you for the PR, I will verify with my data. Just to let you know, this polygon issue is caused by xeniumranger software, and they are currently working on this issue: https://github.com/kharchenkolab/Baysor/issues/153#issuecomment-2686182022.
Also, the issue with those polygons is only visualization; the associated expression vector in the table is valid, so maybe you shouldn't remove it from the table ? I am not sure what would be the best fix
Hm, thanks for the extra info! I think given that we are seeing datasets with this issue in the wild now, we know they exist and will cause errors for some users who processed their data with the faulty model. I think my gut feeling would be to leave the error handling and filtering in, but to modify the warning telling the users to reprocess their data with one of the newer versions 🤔
We released a new version of XeniumRanger (3.1.1) that fixes the number of polygon vertices error
Great to hear @LoganAMorrison! However, I think we should still merge that PR since there might be datasets out there generated with the faulty versions. The current implementations tells users what has been filtered, so it's also not sth sneaky in the background. Wdyt @LucaMarconato?