pygeoapi icon indicating copy to clipboard operation
pygeoapi copied to clipboard

BBOX for Postgres returning unwanted results

Open webb-ben opened this issue 1 year ago • 2 comments

Description Using the bbox intersection with a postgres backend results in items being returned that do not match the filter

Steps to Reproduce Steps to reproduce the behavior:

Go to https://reference.geoconnex.us/collections/mainstems/items?bbox=-109.448547,36.611118,-107.668762,37.322120&properties=uri&limit=1000

Notice that https://geoconnex.us/ref/mainstems/29559 is included in the result although it does not intersect the bbox. It seems the bbox of the feature might intersect with the input bbox.

Expected behavior Only items that intersect with the input bbox are returned as in returned for the CQL request: https://reference.geoconnex.us/collections/mainstems/items?filter=INTERSECTS(geom,%20POLYGON((-109.448547%2036.611118,%20-109.448547%2037.322120,%20-107.668762%2037.322120,%20-107.668762%2036.611118,%20-109.448547%2036.611118)))&limit=1000

Screenshots/Tracebacks image image

Environment

  • OS: All
  • Python version: All
  • pygeoapi version: 0.17.dev0

Additional context Related to https://github.com/DOI-USGS/nhdplusTools/issues/386

webb-ben avatar May 21 '24 14:05 webb-ben

cc @KoalaGeo

Upon further inspection (thanks @webb-ben for the test data) I'm able to reproduce.

In the PostgreSQL provider, the intersects query differs in CQL mode vs bbox mode.

CQL query (via pygeofilter)

WHERE ST_Intersects(public.mainstems.geom, ST_GeomFromEWKT(%(ST_GeomFromEWKT_1)s))) AS anon_1
2024-07-25 07:50:15,984 INFO sqlalchemy.engine.Engine [no key 0.00515s] {'ST_GeomFromEWKT_1': 'SRID=4326;POLYGON ((-109.448547 36.611118, -109.448547 37.32212, -107.668762 37.32212, -107.668762 36.611118, -109.448547 36.611118))'}

bbox query (via SQLAlchemy direct)

WHERE public.mainstems.geom && ST_MakeEnvelope(%(ST_MakeEnvelope_1)s, %(ST_MakeEnvelope_2)s, %(ST_MakeEnvelope_3)s, %(ST_MakeEnvelope_4)s) ORDER BY public.mainstems.id ASC 
 LIMIT %(param_1)s OFFSET %(param_2)s
2024-07-25 07:52:03,748 INFO sqlalchemy.engine.Engine [no key 0.00020s] {'ST_MakeEnvelope_1': -109.448547, 'ST_MakeEnvelope_2': 36.611118, 'ST_MakeEnvelope_3': -107.668762, 'ST_MakeEnvelope_4': 37.32212, 'param_1': 1000, 'param_2': 0}
[2024-07-25T07:52:03Z] {/Users/tomkralidis/Dev/pygeoapi/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1577} INFO - [no key 0.00020s] {'ST_MakeEnvelope_1': -109.448547, 'ST_MakeEnvelope_2': 36.611118, 'ST_MakeEnvelope_3': -107.668762, 'ST_MakeEnvelope_4': 37.32212, 'param_1': 1000, 'param_2': 0}

The query is slightly different, and more importantly pygeofilter is setting SRID=4326 for the bbox/geometry.

Updating the PostgreSQL provider per below gives consistent results with the CQL code path for bbox/geometry:

    def _get_bbox_filter(self, bbox):
        if not bbox:
            return True  # Let everything through

        # Convert bbox to SQLAlchemy clauses
        geom_column = getattr(self.table_model, self.geom)
        envelope = ST_MakeEnvelope(*bbox, 4326)

        bbox_filter = ST_Intersects(geom_column, envelope)

        return bbox_filter

Resulting SQL:

WHERE ST_Intersects(public.mainstems.geom, ST_MakeEnvelope(%(ST_MakeEnvelope_1)s, %(ST_MakeEnvelope_2)s, %(ST_MakeEnvelope_3)s, %(ST_MakeEnvelope_4)s, %(ST_MakeEnvelope_5)s)) ORDER BY public.mainstems.id ASC 
 LIMIT %(param_1)s OFFSET %(param_2)s
2024-07-25 08:01:02,503 INFO sqlalchemy.engine.Engine [no key 0.00111s] {'ST_MakeEnvelope_1': -109.448547, 'ST_MakeEnvelope_2': 36.611118, 'ST_MakeEnvelope_3': -107.668762, 'ST_MakeEnvelope_4': 37.32212, 'ST_MakeEnvelope_5': 4326, 'param_1': 1000, 'param_2': 0}
[2024-07-25T08:01:02Z] {/Users/tomkralidis/Dev/pygeoapi/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1577} INFO - [no key 0.00111s] {'ST_MakeEnvelope_1': -109.448547, 'ST_MakeEnvelope_2': 36.611118, 'ST_MakeEnvelope_3': -107.668762, 'ST_MakeEnvelope_4': 37.32212, 'ST_MakeEnvelope_5': 4326, 'param_1': 1000, 'param_2': 0}

We could hardcode 4326 for the incoming bbox parameter (which arrives via the provider's query() function), however the bbox may or may not have had CRS treatment depending on the storage CRS.

Perhaps the best way to determine and set the srid of the bbox is to set it to the srid of the geometry column as follows:

    def _get_bbox_filter(self, bbox):
        if not bbox:
            return True  # Let everything through

        # Convert bbox to SQLAlchemy clauses
        geom_column = getattr(self.table_model, self.geom)
        geom_column_srid = TODO
        envelope = ST_MakeEnvelope(*bbox, geom_column_srid)

        bbox_filter = ST_Intersects(geom_column, envelope)

        return bbox_filter

Any ideas on how to derive would be valuable (Find_SRID doesn't seem to work).

tomkralidis avatar Jul 25 '24 14:07 tomkralidis

@volcan01010 & @ximenesuk ideas/suggestions?

KoalaGeo avatar Jul 26 '24 00:07 KoalaGeo

This Issue has been inactive for 90 days. As per RFC4, in order to manage maintenance burden, it will be automatically closed in 7 days.

github-actions[bot] avatar Dec 29 '24 03:12 github-actions[bot]

This Issue has been inactive for 90 days. As per RFC4, in order to manage maintenance burden, it will be automatically closed in 7 days.

github-actions[bot] avatar Jun 29 '25 03:06 github-actions[bot]

@tomkralidis @webb-ben sorry, we've not been able to resource looking at this.

Can I add a warning in the docs so users are aware of the issue?

KoalaGeo avatar Jun 29 '25 07:06 KoalaGeo

@KoalaGeo since fixed in #2051 -- thanks @webb-ben !

tomkralidis avatar Jun 29 '25 16:06 tomkralidis