pygeoapi icon indicating copy to clipboard operation
pygeoapi copied to clipboard

Add CQL to PostgreSQL provider (via pygeofilter and sqlalchemy)

Open volcan01010 opened this issue 3 years ago • 28 comments

Overview

This merge request adds CQL query functionality to the PostgreSQL provider. Using CQL allows more powerful queries than simple equality in a WHERE clause, including spatial operators such as INTERSECTS or CROSSES.

CQL is passed to the API plain text via a new parameter (cql=)in the URL. The CQL is parsed into an abstract syntaxt tree (AST) in the API using pygeofilter and parsing errors are handled there. In the backend, the query function makes a call to a new function query_cql in the case where CQL has been provided. This function uses SQLAlchemy's "reflect" capability to generate a TableModel for the table and uses pygeofilter to create a SQLAlchemy filter from the CQL AST. This is used query the table for filtered data, which is then converted to GeoJSON-like format dictionaries to be returned.

The new features work and are covered by tests (in test_api.py and test_postgresql.py) but this merge request is presented as a work-in-progress so that the best way to integrate the new functionality can be discussed.

To test

  • Use the instructions in pygeoapi/provider/postgresql.py to start a local Docker container populated with OSM test data.
  • Use the test config: export PYGEOAPI_CONFIG=tests/pygeoapi-test-config-postgresql.yml
  • Start pygeoapi: pygeoapi serve

Try some queries:

  • ILIKE: http://localhost:5000/collections/hot_osm_waterways/items?f=html&cql=name%20ILIKE%20%27muha%27
  • BBOX: http://localhost:5000/collections/hot_osm_waterways/items?f=html&cql=BBOX(foo_geom,%2029,%20-2.8,%2029.2,%20-2.9)&limit=100
  • CROSSES: http://localhost:5000/collections/hot_osm_waterways/items?f=html&cql=CROSSES(foo_geom,%20LINESTRING(28%20-2,%2030%20-4))&limit=100
  • Bad query: http://localhost:5000/collections/hot_osm_waterways/items?cql=osm_id%20EATS%201

Additional Information

  • ECQL guide (from GeoServer docs): https://docs.geoserver.org/stable/en/user/filter/ecql_reference.html#filter-ecql-reference
  • CQL tutorial: https://docs.geoserver.org/latest/en/user/tutorials/cql/cql_tutorial.html

Questions

  • Should we also accept CQL via POST request like the ElasticSearch backend does?
  • Where should the documentation for this go?

Using SQLAlchemy for all the PostgreSQL backend

If we went fully SQLAlchemy for the PostgreSQL implementation we could use both the WHERE and BBOX clauses from the API call and the CQL query with the same code.

where_filter = ...  # generated from API input
bbox_filter = ...  # generated from API input
cql_filter = ...  # generated from API cql_ast input
q = (session.query(TableModel)
     .filter(where_filter)
     .filter(bbox_filter)
     .filter(cql_filter)
     .order_by(*order_by_clauses)
     .offset(offset)
     .limit(limit))

If any of the provided filters were None they will (should) just be ignored.

Connection pooling

Do we want to have persistent connections? These can save 0.5 seconds per API request. A new PostgreSQLProvider is instantiated for each API request so these would need to held within some singleton item that belonged to the postgresql.py module (e.g. an _active_connections dictionary). We could use the connection parameters concatenated in a string as the key because it has to be hashable and unique.

Also, connections can timeout so we would have to check that the connection was active again before we tried to use it. The process to get a connection would be:

  • generate connection key
  • check for connection in store
  • if there is one, use it
  • if there isn't one, or if an attempt to use it fails, create a new one, store it then use it
  • if the attempt create one fails, raise an error

We could pass "application_name" to the server on connection so that we can recognise the connection.

https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS

Notes on specific SQL Alchemy pygeofilter approach.

pycsw creates SQL Alchemy table models from Base programmatically using the type function. Our approach is to "reflect" the table model from the database.

Contributions and Licensing

(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)

  • [x] I'd like to contribute [feature] to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution.
  • [ ] I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

volcan01010 avatar Aug 24 '22 10:08 volcan01010

Great work! Some additional comments:

  • we need to add pygeofilter to requirements.txt and debian/control
  • Using pygeofilter gives us the opportunity to extend this PR for HTTP POST CQL as JSON. Currently (depending on HTTP GET or POST) we have different codepaths. I wonder whether we can consolidate this behaviour through a single .../items handler to detect GET or POST and process accordingly. This would also allow us to lighten our use of https://github.com/geopython/pygeoapi/tree/master/pygeoapi/models, thereby pushing all CQL handling to pygeofilter proper. Now that I think more, we can cover this in another dedicated PR
  • we can remove pygeofilter_sqlalchemy_demo.py (as long as we have the proper tests should suffice)

cc @constantinius

tomkralidis avatar Aug 24 '22 13:08 tomkralidis

Hi @tomkralidis,

Thanks for the rapid feedback! I'm still composing the main merge request text after we had a forking mishap and lost the previous draft that I had written. I'll let you know when it is final.

I've pushed some extra tweaks, too.

volcan01010 avatar Aug 24 '22 15:08 volcan01010

OK, the merge request is ready. I think @KoalaGeo will arrange a call where we can talk it through in more detail.

volcan01010 avatar Aug 24 '22 16:08 volcan01010

Hi,

@ximenesuk and I are booked onto other projects for the next few weeks. We will come back to this during the week of 19 September.

In response to some of the comments so far:

  • [x] We will remove setup.sh
  • [x] We will generalise exception handling
  • [x] We will change the parameter name to filter and add a filter-lang option
  • [x] We will separate the API tests into success and different failure conditions

In this simple implementation, if you specify cql then the other query parameters will be ignored. The best way to handle both query options is to use replace the entire query function with a SQLAlchemy-based version. We wanted to show this proof-of-concept before doing so, but I think that is the approach that we should take.

Hopefully we can have a call before then, too.

volcan01010 avatar Sep 02 '22 09:09 volcan01010

@tomkralidis @francbartoli what are your thoughts on CQL having a new param cql:true / enable /false / disable

There might be collections where a user doesn't want to offer that level of access/control

KoalaGeo avatar Sep 16 '22 10:09 KoalaGeo

I would delegate that to an external service. CQL in pygeoapi proper should be supported by default (the exception being a given provider does not handle CQL yet, which would throw an exception).

tomkralidis avatar Sep 16 '22 10:09 tomkralidis

We are working on this this week. The changes that we plan to make are:

  • [x] Update documentation to describe CQL with PostgreSQL provider
  • [x] Refactor PostgreSQL backend to be pure SQLAlchemy instead of current hybrid form
  • [x] Update API to allow POSTing of CQL as JSON to the PostgreSQL provider

volcan01010 avatar Sep 20 '22 09:09 volcan01010

@tomkralidis @francbartoli We have merged some work into the branch that addresses the points raised by @volcan01010 in https://github.com/geopython/pygeoapi/pull/964#issuecomment-1235265715 The changes also address two of the points in https://github.com/geopython/pygeoapi/pull/964#issuecomment-1252078191 , namely,

  • Update documentation to describe CQL with PostgreSQL provider
  • Update API to allow POSTing of CQL as JSON to the PostgreSQL provider

This last part has raised a few issues relating to testing and code. The changes include tests for the POSTing of JSON to the API for a PostgreSQL provider. However, we are testing this in isolation just using that one provider with specific data. How can we run these tests using another provider that accepts the POST requests (is ElasticSearch the only one?)? Can your pipelines be run to check the issues arising from this? Do your pipelines test the API for multiple providers? We may need to isolate the Postgres tests using pytest markers or some other way if there are issues here.

In adding support for the POST request to the API we have also needed to refactor parts of your code. In particular the work in this commit https://github.com/geopython/pygeoapi/pull/964/commits/67b8ed6142846912e34240255bb404ceb8f8482c We have slightly reworked the filter-lang get to retain the variable name to aid readability further down, after the logging. We are then using the provider to determine how the CQL is parsed.

The API POST, if the Postgres provider is set up according to https://github.com/geopython/pygeoapi/blob/master/pygeoapi/provider/postgresql.py#L33 , can be tested from the commandline using:

curl --location \
       --request POST 'http://localhost:5000/collections/hot_osm_waterways/items?f=json&limit=50&filter-lang=cql-json' \
       --header 'Content-Type: application/query-cql-json' \
       --data-raw '{"and": [{"between": {"value": {"property": "osm_id"}, "lower": 80800000, "upper": 80900000}}, {"isNull": {"property": "name"}}]}'

Any comments most welcome.

ximenesuk avatar Sep 21 '22 09:09 ximenesuk

I have some questions about the provider.get() method. As part of the response it includes next and prev fields with the IDs of the adjacent records. For the first record in the table, the prev is set to the same record ID (i.e. the first record), and for the last record in the table next is set to the last record. See the code here:

https://github.com/geopython/pygeoapi/blob/e2e12e9bd4395b453cc382084cd7c33172662b3a/pygeoapi/provider/postgresql.py#L415

It seems to me that this will cause problems with looping as consumers of the data will never know when they have exhausted the table. The OGC API specs suggest that the next field should only be present if another record exists.

  • Do you think that the response should be changed so that prev/next only exist if they are valid?
  • If so, should I change that as part of this refactor or submit that as a separate issue?
  • If it is changed, will it affect other providers / the HTML output?

volcan01010 avatar Sep 21 '22 13:09 volcan01010

Also, related to the previous comment/questions, should the next and prev values be links? The server URL would need to be added by the API as the provider doesn't know about those. Also, it doesn't seem like they are used in the response, e.g.

https://ogcapi.bgs.ac.uk/collections/historicalearthquakes/items/8?f=json

volcan01010 avatar Sep 23 '22 08:09 volcan01010

Hi @tomkralidis, @francbartoli,

We have finished our refactoring and think this is ready to merge now. As described above (https://github.com/geopython/pygeoapi/pull/964#issuecomment-1252078191) we have updated the documentation, added the ability to POST CQL as JSON to the PostgreSQL provider and refactored the entire backend to use SQLAlchemy.

The slight caveats are:

  • We haven't tested the API changes against Elastic Search (https://github.com/geopython/pygeoapi/pull/964#issuecomment-1253471554)
  • The next and previous behaviour is unchanged (https://github.com/geopython/pygeoapi/issues/990)

In terms of the backend refactor, the diff isn't much use as so much has changed, so it is easier to read the new file. The SQLAlchemy logic should make sense and the individual commits should show where different logic was added. The trickiest bit was generating the table model. That shouldn't need any future changes so further enhancements (e.g. adding editing capability) shouldn't need deep SQLAlchemy knowledge.

Here are some metrics on the changes.

Metric Previous This branch
postgresql.py lines of code 461 396
Tests count (postgresql.py + api.py) 45 76
Coverage (postgresql.py) 92% 99%
Coverage (api.py) 61% 67%

volcan01010 avatar Sep 29 '22 17:09 volcan01010

For information: I did some basic load tests on the new version to check that SQLAlchemy connection pooling was working. It reported that the new version is about 33% slower.

Number of iterations of multi-endpoint GET script in 30 seconds:

master this branch
60 40

This was a surprise to me as I expected the connection pooling would have sped things up. Perhaps getting a connection is not the bottleneck. I was running both pygeoapi and the k6 load tester on the same machine so this may have skewed the results as k6, pygeoapi and PostgreSQL were all using the same CPU / RAM. It would be interesting to see results on a fully-deployed system.

I also tried using curl for individual requests. With these, the response times were erratic and between 0.14 and 0.2 seconds for both the old and new implementations, so I couldn't really tell the difference.

curl --output /dev/null http://localhost:5000/collections/hot_osm_waterways/items/13990765?f=json -w 'Total: %{time_total}s\n

k6 load tests

I used the k6 tool to run this script that queried multiple endpoints (test-postgresql config):

import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  vus: 20,
  duration: '30s',
};

export default function () {
  http.get('http://localhost:5000/collections/hot_osm_waterways/items/13990765?f=json');
  http.get('http://localhost:5000/collections/hot_osm_waterways/items/30030186?f=json');
  http.get('http://localhost:5000/collections/hot_osm_waterways/items/80827787?f=json');
  http.get('http://localhost:5000/collections/hot_osm_waterways/items/124128967?f=json');
  http.get('http://localhost:5000/collections/hot_osm_waterways/items/150771186?f=json');
  http.get('http://localhost:5000/collections/hot_osm_waterways/items?f=json&resulttype=hits');
  http.get('http://localhost:5000/collections/hot_osm_waterways/items?f=json&limit=20&offset=100');
}

Run with k6 Docker container:

docker run --rm -i --network host loadimpact/k6 run - < k6_test.js

volcan01010 avatar Sep 29 '22 17:09 volcan01010

First of all @volcan01010, this is an impressive improvement! +1 to merge it but let's wait for more consensus since the rework with the provider is quite huge. Also, I'd like to test the performance given your previous comments

francbartoli avatar Sep 29 '22 18:09 francbartoli

I wanted to see why the SQLAlchemy version was slower, so I wrote the following script to time multiple requests, only looking at the provider.

import os
import time

from pygeoapi.provider.postgresql import PostgreSQLProvider

NUM_REQUESTS = 100
PASSWORD = os.environ.get('POSTGRESQL_PASSWORD', 'postgres')


def instantiate_and_query():
    config = {
        'name': 'PostgreSQL',
        'type': 'feature',
        'data': {'host': '127.0.0.1',
                 'dbname': 'test',
                 'user': 'postgres',
                 'password': PASSWORD,
                 'search_path': ['osm', 'public']
                 },
        'id_field': 'osm_id',
        'table': 'hotosm_bdi_waterways',
        'geom_field': 'foo_geom'
    }

    provider = PostgreSQLProvider(config)
    _ = provider.get(29701937)


if __name__ == "__main__":
    start = time.time()
    for _ in range(NUM_REQUESTS):
        instantiate_and_query()
    elapsed = time.time() - start
    print(f"{NUM_REQUESTS} requests processed in {elapsed:.1f} seconds.")

I then ran it against the master and pygeofilter-sqlalchemy branches. They came out about the same:

# master
100 requests processed in 7.1 seconds.
100 requests processed in 6.6 seconds.
100 requests processed in 6.4 seconds.
100 requests processed in 6.3 seconds.
100 requests processed in 6.6 seconds.

# pygeofilter-sqlalchemy
100 requests processed in 6.7 seconds.
100 requests processed in 6.4 seconds.
100 requests processed in 6.0 seconds.
100 requests processed in 5.8 seconds.
100 requests processed in 6.0 seconds.

After that, I ran the script with pyinstrument to see where it was spending its time (pyinstrument bottleneck.py):

100 requests processed in 6.7 seconds.

  _     ._   __/__   _ _  _  _ _/_   Recorded: 22:13:54  Samples:  4506
 /_//_/// /_\ / //_// / //_'/ //     Duration: 7.807     CPU time: 4.394
/   _/                      v4.3.0

Program: bottleneck.py

7.807 <module>  <string>:1
   [4 frames hidden]  <string>, runpy
      7.807 _run_code  runpy.py:64
      └─ 7.807 <module>  bottleneck.py:1
         ├─ 6.707 instantiate_and_query  bottleneck.py:10
         │  ├─ 3.637 get  pygeoapi/provider/postgresql.py:185
         │  │  ├─ 2.253 first  sqlalchemy/orm/query.py:2795
         │  │  │     [476 frames hidden]  sqlalchemy, encodings, weakref, <buil...
         │  │  │        1.837 do_execute  sqlalchemy/engine/default.py:731
         │  │  │        └─ 1.836 [self]  
         │  │  ├─ 1.016 get  <string>:1
         │  │  │     [518 frames hidden]  <string>, sqlalchemy, <built-in>, wea...
         │  │  ├─ 0.130 query  sqlalchemy/orm/session.py:2153
         │  │  │     [124 frames hidden]  sqlalchemy, weakref, <built-in>, logg...
         │  │  ├─ 0.108 __exit__  sqlalchemy/orm/session.py:1166
         │  │  │     [37 frames hidden]  sqlalchemy, <built-in>, threading
         │  │  └─ 0.086 _sqlalchemy_to_feature  pygeoapi/provider/postgresql.py:298
         │  └─ 3.067 __init__  pygeoapi/provider/postgresql.py:74
         │     └─ 2.995 _reflect_table_model  pygeoapi/provider/postgresql.py:258
         │        ├─ 2.685 reflect  sqlalchemy/sql/schema.py:4729
         │        │     [1045 frames hidden]  sqlalchemy, <string>, encodings, <bui...
         │        └─ 0.287 prepare  <string>:1
         │              [317 frames hidden]  <string>, sqlalchemy, abc, <built-in>...
         ├─ 0.706 <module>  pygeoapi/provider/postgresql.py:49
         │  ├─ 0.383 <module>  geoalchemy2/__init__.py:1
         │  │     [840 frames hidden]  geoalchemy2, pkg_resources, <built-in...
         │  ├─ 0.202 <module>  sqlalchemy/__init__.py:8
         │  │     [524 frames hidden]  sqlalchemy, textwrap, <built-in>, ins...
         │  └─ 0.079 <module>  sqlalchemy/ext/automap.py:8
         │        [185 frames hidden]  sqlalchemy, textwrap, <built-in>, re,...
         └─ 0.392 <module>  pygeoapi/__init__.py:30
            └─ 0.381 <module>  pygeoapi/config.py:30
               ├─ 0.286 <module>  pygeoapi/util.py:30
               │  └─ 0.151 <module>  shapely/geometry/__init__.py:1
               │        [370 frames hidden]  shapely, numpy, pathlib, <built-in>, ...
               └─ 0.087 <module>  jsonschema/__init__.py:1
                     [291 frames hidden]  jsonschema, urllib, http, email, re, ...

These results show that initialising the class takes as long as running the "get" query because it has to "reflect" the table. This means each query makes two requests to the database. But the table model doesn't change each time. I'm going to try caching the table model.

volcan01010 avatar Sep 29 '22 21:09 volcan01010

Thanks for this deep analysis, it makes a lot of sense to definitively caching

francbartoli avatar Sep 29 '22 21:09 francbartoli

As I hoped, this has made the query take about half as long as we now only make two trips to the database for the first request.

100 requests processed in 2.8 seconds.

  _     ._   __/__   _ _  _  _ _/_   Recorded: 22:57:42  Samples:  1724
 /_//_/// /_\ / //_// / //_'/ //     Duration: 3.920     CPU time: 1.759
/   _/                      v4.3.0

Program: /tmp/bottleneck.py

3.920 <module>  <string>:1
   [4 frames hidden]  <string>, runpy
      3.920 _run_code  runpy.py:64
      └─ 3.920 <module>  bottleneck.py:1
         ├─ 2.831 instantiate_and_query  bottleneck.py:10
         │  ├─ 2.680 get  pygeoapi/provider/postgresql.py:184
         │  │  ├─ 1.899 first  sqlalchemy/orm/query.py:2795
         │  │  │     [142 frames hidden]  sqlalchemy, <built-in>, <string>, re
         │  │  │        1.793 do_execute  sqlalchemy/engine/default.py:731
         │  │  ├─ 0.612 get  <string>:1
         │  │  │     [173 frames hidden]  <string>, sqlalchemy, <built-in>, wea...
         │  │  └─ 0.068 __exit__  sqlalchemy/orm/session.py:1166
         │  │        [35 frames hidden]  sqlalchemy, <built-in>
         │  └─ 0.143 __init__  pygeoapi/provider/postgresql.py:74
         │     └─ 0.108 _get_engine_and_table_model  pygeoapi/provider/postgresql.py:230
         │        └─ 0.081 _reflect_table_model  pygeoapi/provider/postgresql.py:261
         │           └─ 0.076 reflect  sqlalchemy/sql/schema.py:4729
         │                 [255 frames hidden]  sqlalchemy, <string>, <built-in>, re,...
         ├─ 0.710 <module>  pygeoapi/provider/postgresql.py:49
         │  ├─ 0.354 <module>  geoalchemy2/__init__.py:1
         │  │     [786 frames hidden]  geoalchemy2, pkg_resources, <built-in...
         │  ├─ 0.234 <module>  sqlalchemy/__init__.py:8
         │  │     [537 frames hidden]  sqlalchemy, textwrap, <built-in>, fun...
         │  └─ 0.081 <module>  sqlalchemy/ext/automap.py:8
         │        [182 frames hidden]  sqlalchemy, textwrap, re, <built-in>,...
         └─ 0.370 <module>  pygeoapi/__init__.py:30
            └─ 0.358 <module>  pygeoapi/config.py:30
               ├─ 0.272 <module>  pygeoapi/util.py:30
               │  └─ 0.144 <module>  shapely/geometry/__init__.py:1
               │        [290 frames hidden]  shapely, numpy, pathlib, ntpath, <bui...
               └─ 0.080 <module>  jsonschema/__init__.py:1
                     [244 frames hidden]  jsonschema, urllib, http, email, sock...

It seems to be consistent, too.

100 requests processed in 2.7 seconds.
100 requests processed in 2.7 seconds.
100 requests processed in 2.7 seconds.
100 requests processed in 2.7 seconds.
100 requests processed in 2.7 seconds.

And the load testing is better, too. With the SQL Alchemy + caching it is 40% higher than the current release.

master without table model cache with table model cache
60 40 84

volcan01010 avatar Sep 29 '22 22:09 volcan01010

I've made a couple of minor tweaks this morning, but I think that we are finished from our end. I have updated the example queries to use filter as the keyword instead of cql.

Example queries:

  • ILIKE:http://localhost:5000/collections/hot_osm_waterways/items?f=html&filter=name%20ILIKE%20%27muha%27
  • BBOX: http://localhost:5000/collections/hot_osm_waterways/items?f=html&filter=BBOX(foo_geom,%2029,%20-2.8,%2029.2,%20-2.9)&limit=100
  • CROSSES: http://localhost:5000/collections/hot_osm_waterways/items?f=html&filter=CROSSES(foo_geom,%20LINESTRING(28%20-2,%2030%20-4))&limit=100
  • Bad query: http://localhost:5000/collections/hot_osm_waterways/items?filter=osm_id%20EATS%201

volcan01010 avatar Sep 30 '22 09:09 volcan01010

Aaargh! I just tried out this example CURL command from @ximenesuk:

curl --location \
       --request POST 'http://localhost:5000/collections/hot_osm_waterways/items?f=json&limit=50&filter-lang=cql-json' \
       --header 'Content-Type: application/query-cql-json' \
       --data-raw '{"and": [{"between": {"value": {"property": "osm_id"}, "lower": 80800000, "upper": 80900000}}, {"isNull": {"property": "name"}}]}'

It now fails with:

{
    "code": "InvalidParameterValue",
    "description": "Collection is not editable"
}

It seems like adding support for API transactions (in this commit: faa00e7c) since we branched off to implement the CQL-JSON POST has stopped us being able to POST to PostgreSQL.

volcan01010 avatar Sep 30 '22 09:09 volcan01010

Fixing in master now..

tomkralidis avatar Sep 30 '22 10:09 tomkralidis

@volcan01010 before I push a few updates, does the following work for you:

diff --git a/pygeoapi/flask_app.py b/pygeoapi/flask_app.py
index 0dbcd66..1a9457f 100644
--- a/pygeoapi/flask_app.py
+++ b/pygeoapi/flask_app.py
@@ -195,12 +195,13 @@ def collection_items(collection_id, item_id=None):
                 api_.get_collection_items(request, collection_id))
         elif request.method == 'POST':  # filter or manage items
             if request.content_type is not None:
-                return get_response(
-                    api_.manage_collection_item(request, 'create',
-                                                collection_id))
-            else:
-                return get_response(
-                    api_.post_collection_items(request, collection_id))
+                if request.content_type == 'application/geo+json':
+                    return get_response(
+                        api_.manage_collection_item(request, 'create',
+                                                    collection_id))
+                else:
+                    return get_response(
+                        api_.post_collection_items(request, collection_id))
 
     elif request.method == 'DELETE':
         return get_response(

tomkralidis avatar Sep 30 '22 10:09 tomkralidis

Yes, that has fixed it. I have applied the patch and pushed it up.

volcan01010 avatar Sep 30 '22 11:09 volcan01010

@volcan01010 I need to apply this change directly along with some additional small changes. If this works for you, let me know and I will push here, at which point you would be able to rebase.

tomkralidis avatar Sep 30 '22 12:09 tomkralidis

Do you mean that you update master and I rebase here? Yes, that works for me.

volcan01010 avatar Sep 30 '22 13:09 volcan01010

+1. Pushing momentarily.

tomkralidis avatar Sep 30 '22 13:09 tomkralidis

@volcan01010 you should be good to rebase now.

tomkralidis avatar Sep 30 '22 13:09 tomkralidis

I don't think that it needs to. GitHub says there are no conflicts and I applied your patch (https://github.com/geopython/pygeoapi/pull/964/commits/6e62d8f67096e12ce14e0e05ec2e8735315e6561) so the POST request works.

volcan01010 avatar Sep 30 '22 14:09 volcan01010

@tomkralidis - I had a go at rebasing to master but it was a complicated merge with many conflicts and I didn't manage to do it without breaking things. Instead I have merged master into this branch. We will have the ugly merge commit in the history, but the advantage is that the next merge is clean. All the tests for postgresql.py and api.py are still green and the CURL command works.

volcan01010 avatar Oct 03 '22 10:10 volcan01010

Need to wait for https://github.com/geopython/pygeofilter/pull/56 or https://github.com/geopython/pygeofilter/pull/59 to be merged and closed which will fix outstanding bug:

File "/pygeoapi/pygeoapi/provider/postgresql.py", line 60, in <module>
  from pygeofilter.backends.sqlalchemy.evaluate import to_filter
File "/usr/local/lib/python3.8/dist-packages/pygeofilter/backends/sqlalchemy/__init__.py", line 1, in <module>
  from .evaluate import to_filter
File "/usr/local/lib/python3.8/dist-packages/pygeofilter/backends/sqlalchemy/evaluate.py", line 4, in <module>
  from . import filters
File "/usr/local/lib/python3.8/dist-packages/pygeofilter/backends/sqlalchemy/filters.py", line 8, in <module>
  from pygeoif.geometry import as_shape
ImportError: cannot import name 'as_shape' from 'pygeoif.geometry' (/usr/local/lib/python3.8/dist-packages/pygeoif/geometry.py)
ERROR: openapi.yml could not be generated ERROR

KoalaGeo avatar Oct 10 '22 22:10 KoalaGeo

pygeoif / pygeofilter dependancy bug is now fixed with https://github.com/geopython/pygeofilter/pull/61

However new bug discovered to be fixed before merge - https://github.com/BritishGeologicalSurvey/pygeoapi/issues/13

KoalaGeo avatar Oct 18 '22 15:10 KoalaGeo

@tomkralidis & @francbartoli we've got this working internally but have two further PR into our branch pygeofilter-sqlalchemy to make then this will be ready to merge - thanks for your patience!

KoalaGeo avatar Oct 24 '22 21:10 KoalaGeo