spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

polygon_query() throws AttributeError: 'NoneType' object has no attribute 'intersects' error

Open olivermccallion opened this issue 1 year ago • 7 comments

Hi,

I'm trying to perform a polygon_query() on a Xenium object:

SpatialData object with:
├── Images
│     ├── 'he_image': MultiscaleSpatialImage[cyx] (3, 33558, 57706), (3, 16779, 28853), (3, 8389, 14426), (3, 4194, 7213), (3, 2097, 3606)
│     └── 'morphology_focus': MultiscaleSpatialImage[cyx] (5, 44236, 25670), (5, 22118, 12835), (5, 11059, 6417), (5, 5529, 3208), (5, 2764, 1604)
├── Labels
│     ├── 'cell_labels': MultiscaleSpatialImage[yx] (44236, 25670), (22118, 12835), (11059, 6417), (5529, 3208), (2764, 1604)
│     └── 'nucleus_labels': MultiscaleSpatialImage[yx] (44236, 25670), (22118, 12835), (11059, 6417), (5529, 3208), (2764, 1604)
├── Points
│     └── 'transcripts': DataFrame with shape: (3153129, 11) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (21922, 1) (2D shapes)
│     ├── 'cell_circles': GeoDataFrame shape: (21922, 2) (2D shapes)
│     ├── 'glomeruli': GeoDataFrame shape: (5, 1) (2D shapes)
│     ├── 'infiltrate_1': GeoDataFrame shape: (1, 1) (2D shapes)
│     ├── 'nucleus_boundaries': GeoDataFrame shape: (21669, 1) (2D shapes)
│     ├── 'peri-infiltrate_boundary-hi': GeoDataFrame shape: (9, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (21922, 477)
with coordinate systems:
▸ 'global', with elements:
        he_image (Images), morphology_focus (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), decent_tissue (Shapes), glomeruli (Shapes), infiltrate_1 (Shapes),  nucleus_boundaries (Shapes), peri-infiltrate_boundary-hi (Shapes)

However, whilst I'm able to perform a bounding box query, a polygon query using a region annotated in napari throws the following error: Code:

polygon = sdata.shapes['infiltrate_1'].geometry.iloc[0]
query_sdata = polygon_query(sdata, polygon, target_coordinate_system="global")
query_sdata

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[10], line 1
----> 1 query_sdata = sdata.query.polygon(polygon, target_coordinate_system="global")

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/spatialdata/_core/spatialdata.py:1761, in QueryManager.polygon(self, polygon, target_coordinate_system, filter_table)
   1753 """
   1754 Perform a polygon query on the SpatialData object.
   1755 
   1756 Please see
   1757 :func:`spatialdata.polygon_query` for the complete docstring.
   1758 """
   1759 from spatialdata._core.query.spatial_query import polygon_query
-> 1761 return polygon_query(  # type: ignore[return-value]
   1762     self._sdata,
   1763     polygon=polygon,
   1764     target_coordinate_system=target_coordinate_system,
   1765     filter_table=filter_table,
   1766 )

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/functools.py:878, in singledispatch.<locals>.wrapper(*args, **kw)
    874 if not args:
    875     raise TypeError(f'{funcname} requires at least '
    876                     '1 positional argument')
--> 878 return dispatch(args[0].__class__)(*args, **kw)

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/spatialdata/_core/query/spatial_query.py:828, in _(sdata, polygon, target_coordinate_system, filter_table, shapes, points, images, labels)
    826 for element_type in ["points", "images", "labels", "shapes"]:
    827     elements = getattr(sdata, element_type)
--> 828     queried_elements = _dict_query_dispatcher(
    829         elements,
    830         polygon_query,
    831         polygon=polygon,
    832         target_coordinate_system=target_coordinate_system,
    833     )
    834     new_elements[element_type] = queried_elements
    836 tables = _get_filtered_or_unfiltered_tables(filter_table, new_elements, sdata)

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/spatialdata/_core/query/spatial_query.py:398, in _dict_query_dispatcher(elements, query_function, **kwargs)
    396 assert isinstance(d, dict)
    397 if target_coordinate_system in d:
--> 398     result = query_function(element, **kwargs)
    399     if result is not None:
    400         # query returns None if it is empty
    401         queried_elements[key] = result

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/functools.py:878, in singledispatch.<locals>.wrapper(*args, **kw)
    874 if not args:
    875     raise TypeError(f'{funcname} requires at least '
    876                     '1 positional argument')
--> 878 return dispatch(args[0].__class__)(*args, **kw)

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/spatialdata/_core/query/spatial_query.py:912, in _(element, polygon, target_coordinate_system, **kwargs)
    910 else:
    911     buffered[OLD_INDEX] = buffered.index
--> 912 indices = buffered.geometry.apply(lambda x: x.intersects(polygon))
    913 if np.sum(indices) == 0:
    914     return None

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/geopandas/geoseries.py:673, in GeoSeries.apply(self, func, convert_dtype, args, **kwargs)
    670         kwargs["convert_dtype"] = True
    672 # to avoid warning
--> 673 result = super().apply(func, args=args, **kwargs)
    674 if isinstance(result, GeoSeries):
    675     if self.crs is not None:

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/pandas/core/series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
   4789 def apply(
   4790     self,
   4791     func: AggFuncType,
   (...)
   4796     **kwargs,
   4797 ) -> DataFrame | Series:
   4798     """
   4799     Invoke function on values of Series.
   4800 
   (...)
   4915     dtype: float64
   4916     """
   4917     return SeriesApply(
   4918         self,
   4919         func,
   4920         convert_dtype=convert_dtype,
   4921         by_row=by_row,
   4922         args=args,
   4923         kwargs=kwargs,
-> 4924     ).apply()

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/pandas/core/apply.py:1427, in SeriesApply.apply(self)
   1424     return self.apply_compat()
   1426 # self.func is Callable
-> 1427 return self.apply_standard()

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/pandas/core/apply.py:1507, in SeriesApply.apply_standard(self)
   1501 # row-wise access
   1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
   1503 # we need to give `na_action="ignore"` for categorical data.
   1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
   1505 #  Categorical (GH51645).
   1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
   1508     mapper=curried, na_action=action, convert=self.convert_dtype
   1509 )
   1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1512     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1513     #  See also GH#25959 regarding EA support
   1514     return obj._constructor_expanddim(list(mapped), index=obj.index)

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/pandas/core/base.py:919, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
    916 arr = self._values
    918 if isinstance(arr, ExtensionArray):
--> 919     return arr.map(mapper, na_action=na_action)
    921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/pandas/core/arrays/base.py:2322, in ExtensionArray.map(self, mapper, na_action)
   2302 def map(self, mapper, na_action=None):
   2303     """
   2304     Map values using an input mapping or function.
   2305 
   (...)
   2320         a MultiIndex will be returned.
   2321     """
-> 2322     return map_array(self, mapper, na_action=na_action)

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/pandas/core/algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
   1741 values = arr.astype(object, copy=False)
   1742 if na_action is None:
-> 1743     return lib.map_infer(values, mapper, convert=convert)
   1744 else:
   1745     return lib.map_infer_mask(
   1746         values, mapper, mask=isna(values).view(np.uint8), convert=convert
   1747     )

File lib.pyx:2972, in pandas._libs.lib.map_infer()

File /opt/anaconda3/envs/spatialdata_py3.10/lib/python3.10/site-packages/spatialdata/_core/query/spatial_query.py:912, in _.<locals>.<lambda>(x)
    910 else:
    911     buffered[OLD_INDEX] = buffered.index
--> 912 indices = buffered.geometry.apply(lambda x: x.intersects(polygon))
    913 if np.sum(indices) == 0:
    914     return None

AttributeError: 'NoneType' object has no attribute 'intersects'

Plotting both the query polygon and transcripts confirms that the coordinate systems are congruent, and using 10X Xenium Explorer exported coordinates as used in this notebook (https://github.com/quentinblampey/spatialdata_xenium_explorer/blob/master/docs/10x_tutorials/xenium/xenium_tuto.ipynb) also throws the same error.

Could you point me in the right direction for what's going wrong?

My environment is: Uploading env.txt…

olivermccallion avatar Apr 28 '24 22:04 olivermccallion

I have had the same issue -- Based on where it fails in the loop across elements, I think it might have something to do with if a nucleus isn't detected in a particular segment?

I will note that everything seems to work alright if line 913 in spatial_query.py is changed to:

    indices = buffered.geometry.apply(
        lambda x: False if x is None else x.intersects(polygon))

easlinger avatar Jun 20 '24 17:06 easlinger

More recent versions of spatialdata require a lot more finangling to get it to work, and the solutions I've found so far are not the most stable and result in dropping cells with no detected nuclei.

My group would love to get this feature sorted out soon; thanks so much for all your hard work on this package!

easlinger avatar Jul 31 '24 16:07 easlinger

This is a terrible and hack-y solution (I hesitate to even call it that), but I was up against a deadline to get some cropped objects ASAP, so this is the best I could cobble together. I'm not even sure how informative it is, but I figure more information is usually better than none.

When it was working for me (albeit dropping no-nuclei cells):

I had 0.2.2.dev15+ga9fb477 of spatialdata and 0.1.3.post1.dev3+g6a8b968 of spatialdata_io installed.

spatialdata/_core/query/relational_query.py may have to have _get_element_annotators function manually added (copy from the renamed get_element_annotators) for the import of spatialdata_plot to work, depending on the state of other packages.

@polygon_query.register in spatialdata/_core/query/spatial_query.py had to be changed to the following:

@polygon_query.register(GeoDataFrame)
def _(
    element: GeoDataFrame,
    polygon: Polygon | MultiPolygon,
    target_coordinate_system: str,
    **kwargs: Any,
) -> GeoDataFrame | None:
    from spatialdata.transformations import get_transformation, set_transformation

    _check_deprecated_kwargs(kwargs)
    polygon_gdf = _get_polygon_in_intrinsic_coordinates(
        element, target_coordinate_system, polygon)
    polygon = polygon_gdf["geometry"].iloc[0]
    
    element = element.dropna()
    
    buffered = to_polygons(
        element) if ShapesModel.RADIUS_KEY in element.columns else element

    OLD_INDEX = "__old_index"
    if OLD_INDEX in buffered.columns:
        assert np.all(element[OLD_INDEX] == buffered.index)
    else:
        buffered[OLD_INDEX] = buffered.index
    # indices = buffered.geometry.apply(lambda x: x.intersects(polygon))
    indices = buffered.geometry.apply(
        lambda x: False if x is None or (
            isinstance(pd.isnull(x), bool) and pd.isnull(x)
            ) else x.intersects(polygon))
    if np.sum(indices) == 0:
        return None
    queried_shapes = element[indices]
    queried_shapes.index = buffered[indices][OLD_INDEX]
    queried_shapes.index.name = None
    del buffered[OLD_INDEX]
    if OLD_INDEX in queried_shapes.columns:
        del queried_shapes[OLD_INDEX]
    transformation = get_transformation(buffered, target_coordinate_system)
    queried_shapes = ShapesModel.parse(queried_shapes)
    set_transformation(queried_shapes, transformation, target_coordinate_system)
    t = get_transformation(queried_shapes, get_all=True)
    assert isinstance(t, dict)
    set_transformation(queried_shapes, t.copy(), set_all=True)
    return queried_shapes

the line assigning buffered_df["geometry"] in @to_polygons.register in spatialdata/_core/operations/vectorize.py had to be changed as shown below

buffered_df["geometry"] = buffered_df.apply(
    lambda row: None if pd.isnull(
        row.geometry) else row.geometry.buffer(
            row[ShapesModel.RADIUS_KEY],
            resolution=buffer_resolution), axis=1
)

Then I used

coords = cr.pp.xenium_explorer_selection(file_coord, pixel_size=0.2125)
if isinstance(coords, list):  # if multiple selections...
    coords = shapely.MultiPolygon(coords)  # ...union of areas
kws = {"target_coordinate_system": "global", "filter_table": True}
sdata = self.adata.query.polygon(coords, **kws)  # crop
i_x = sdata.table.obs["cell_id"].copy()
del sdata.table
sdata.table = adata[adata.obs["cell_id"].isin(i_x[
    i_x.isin(adata.obs["cell_id"])])]
if not os.path.exists(file_obj_crop):
    sdata.table.write_h5ad(file_obj_crop)

for trouble-shooting purposes, and that code worked.

Environment yaml as a text file: environment_old.txt

easlinger avatar Jul 31 '24 16:07 easlinger

Thank you @olivermccallion and @easlinger for reporting this and for sharing the details and apologies for the delayed answer.

Unfortunately we could not reproduce the bug that you are reporting in our Xenium experiments. The information about the installed versions and the workaround you posted will be very useful. But in addition to this, it would be important if you could share instructions to reproduce this on a real or artificial dataset. In particular, it would be helpful to know which region of interest you are drawing, or have them saved so that we can load them.

If the data you are working with is confidential, please consider creating a tiny crop/alter it/subset the genes, so that the original content is not recognizable; or you can send it to me privately via Zulip.

Thank you in advance, and happy to look into this.

LucaMarconato avatar Jul 31 '24 16:07 LucaMarconato

I believe I found the origin of the bug, I am working on a solution to this. I will keep you posted.

LucaMarconato avatar Aug 01 '24 21:08 LucaMarconato

I could reproduce a related bug (not the reported bug)

Update: I managed to reproduce a bug with polygon_query() due to the radii of some of cell_circles in Xenium SpatialData objects being set to NaN. These radii correspond to those cells that are detected without a nucleus when the data has been generated using the multimodal segmentation kit. https://github.com/scverse/spatialdata-io/issues/173

Notes:

  • I cannot reproduce the error AttributeError: 'NoneType' object has no attribute 'intersects'; I get instead the error ValueError("buffer distance must be finite") on a few lines above. I still think that they are related and that by solving mine you will also see your bug resolved.
  • A quick way to check if the nan values are the roots of the problem is to run this snipped before performing the spatial_query(). I kindly ask you to check if this makes your bug disappear.
import numpy as np

circles = sdata['cell_circles']
i = np.where(circles.radius.isna())[0]
circles.iloc[i, circles.columns.get_loc('radius')] = 1

Solution

If the cause is the one described above, the solution is to make sure that the data is parsed without nan values. I implemented it this week in spatialdata-io, and I also now make sure (in spatialdata) that the radius is never nan. Therefore calling xenium() again should make the problem fade away.

Nevertheless, if you already wrote the data to disk, a different approach is required, I am preparing it and I will share this information.

LucaMarconato avatar Aug 02 '24 14:08 LucaMarconato

Here is it, please find detailed instructions on how to correct the radii if the data is already written to disk: https://github.com/scverse/spatialdata/discussions/657.

Looking forward your your feedbacks @olivermccallion @easlinger, I am optimistic that the solution above will fix the issue.

LucaMarconato avatar Aug 02 '24 16:08 LucaMarconato

I will close the issue as the approach described above describes a solution.

LucaMarconato avatar Aug 16 '24 15:08 LucaMarconato

Here is it, please find detailed instructions on how to correct the radii if the data is already written to disk: #657.

Looking forward your your feedbacks @olivermccallion @easlinger, I am optimistic that the solution above will fix the issue.

Thanks so much for the fix!

easlinger avatar Aug 17 '24 18:08 easlinger