pygeoapi icon indicating copy to clipboard operation
pygeoapi copied to clipboard

Add new query types to EDR xarray provider leaving the potential open for new and custom providers.

Open dblodgett-usgs opened this issue 4 years ago • 11 comments

This is something between a planning task and a feature request to be worked by Cliff Hill (USGS) and Shane Mill (NOAA).

As we add query types to the EDR Xarray Provider, we need to keep an eye on how to add new providers and custom versions of the providers. Area and trajectory queries seem like a good place to start.

Closure Criteria: Close this once we have a basic implementation of the area and trajectory queries available for the xarray EDR provider and an initial plan for how to handle custom EDR providers on a per-query-pattern basis.

dblodgett-usgs avatar Nov 04 '21 15:11 dblodgett-usgs

cc @ShaneMill1 @hillc-usgs

As discussed, our current design pattern is for EDR providers to have a baseline query function which accepts **kwargs and detects query types from there. An EDR provider advertises supported query types by implementing a get_query_types function which returns a list of supported query types.

Options:

  1. have the base EDR provider stub in explicit query type functions (def area(self, ..., def trajectory (self, ..., etc.). Then the EDR codepath in pygeoapi.api would call said query type which would return a NotImplementedError() by default unless overridden by the EDR plugin
  2. keep the base EDR provider and pygeoapi.api as is, and have the EDR provider's query() function simply call an underlying query pattern functions by name. This would keep the core codebase and base EDR provider as is
  3. other options?

Notes:

  • the provider needs to advertise which query types are supported (for query validation and OpenAPI documentation generation)
  • the EDR provider's query() function is also a baseline data access function across all pygeoapi resource query patterns (for OAFeat, OACov, OARec, etc.), there might be valuable in having this consistency across the codebase/providers

Here's an small proof of concept/example of option 2 in as an EDR provider:

class EDR:
    def query(self, **kwargs):
        try:
            return getattr(self, kwargs.get('query_type'))(**kwargs)
        except AttributeError:
            raise NotImplementedError('query not implemented!')

    def cube(self, **kwargs):
        return 'hi'

# imaginary downstream caller
e = EDR()

print(e.query(query_type='cube'))
print(e.query(query_type='trajectory'))

Thoughts/comments/ideas?

tomkralidis avatar Nov 05 '21 22:11 tomkralidis

@tomkralidis I think this looks very good and am supportive option 2. I should hopefully be able to contribute soon to the area and trajectory query. I will need to investigate how the code I have fits in, or if I will need to refactor a little bit on my end. I will keep everyone updated on progress

ShaneMill1 avatar Nov 08 '21 16:11 ShaneMill1

Looks good to me, I concur about option 2. I am looking over everything to get more of a feel for the codebase in general.

hillc-usgs avatar Nov 10 '21 13:11 hillc-usgs

Might I suggest using a method decorator to mark the query type implementation methods with, so it automatically can add/register that query type method's name to the list that then can be shown with get_query_types()?

hillc-usgs avatar Nov 12 '21 16:11 hillc-usgs

What I mean is:

query() simply maps to a registered query_type, which is named as the first parameter, to keep the API consistent. The query_type methods then are registered with a decorator (probably defined inside the base_edr class), and get_query_types() simply lists the names of the registered query_type methods for that class. So people wouldn't need to ever adjust query() or get_query_types() directly, it does that all automatically for them. All they need to do then is define and decorate a query_type method to have it added to the registry and be available in the API.

hillc-usgs avatar Nov 12 '21 16:11 hillc-usgs

I believe this would make it very DRY code, and as minimalistic in the approach as possible.

hillc-usgs avatar Nov 12 '21 16:11 hillc-usgs

Good idea @hillc-usgs; here's an example below for review/comment:

class EDRBase:
    query_types = []

    @classmethod
    def register(cls):
        def inner(fn):
            cls.query_types.append(fn.__name__)
            return fn
        return inner

    def get_query_types(self):
        return self.query_types

    def query(self, **kwargs):
        try:
            return getattr(self, kwargs.get('query_type'))(**kwargs)
        except AttributeError:
            raise NotImplementedError('query not implemented!')

class EDRClass(EDRBase):

    def __init__(self):
        super().__init__()

    @EDRBase.register()
    def cube(self, **kwargs):
        return "cube!"

e = EDRClass()
print(e.get_query_types())
print(e.query(query_type='cube'))
print(e.query(query_type='cube2'))

tomkralidis avatar Nov 13 '21 18:11 tomkralidis

@tomkralidis That looks like a really good clean approach

ShaneMill1 avatar Nov 16 '21 23:11 ShaneMill1

I took some time to add some code for an initial prototype of this in pygeoapi.

https://github.com/ShaneMill1/pygeoapi/blob/nws-edr-api-integration/pygeoapi/provider/xarray_edr.py

A few notes:

  • The approach already in the xarray_edr.py is to direct behavior based on the WKT type (ie. Point, Polygon, LineString, etc.). I believe this is the right approach because /area and /cube would both use Polygon for the WKT type, so we can use the same logic for /area and /cube.
  • The logic in interacting with the xarray dataset object is different for Point compared to Polygon. For Point, it is sufficient to use the data.sel(coords, method=nearest) but with Polygon I introduced the use of rioxarray to extract a polygon in a more sophisticated way. This introduces a new library (rioxarray) and I use numpy to stack the coordinates to create the geometry.
  • If you place x_field and y_field in the configuration, you can call this from the provider to generically translate between what rioxarray looks for (x and y as coordinate names) and the native coordinate names. This is a great way to get past the non generic way that NetCDF files often treat coordinate names.

Demonstration:

Within the Swagger Documentation, I place a WKT definition for a polygon: POLYGON((-63.00207488085257 43.572088451973464,-71.43957488085256 39.90094508276867,-68.97863738085256 27.048919615272702,-47.88488738085257 27.361589726808624,-42.61144988085257 39.087068248482026,-56.14660613085257 40.838390656449825,-63.00207488085257 43.572088451973464))

wkt_example

coads-sst-area

This gives the following response in CoverageJSON:

coads-sst-area-response

Using the CoverageJSON playground (https://covjson.org/playground/), we have the following visualization:

coads-sst-area-visual

Final Thoughts

  • I think that the metadata being sent to the coverageJSON output module is messing up how the data appears in the visualization. I think the metadata assumes a bounding box which is not always the case for an area query, so I need to fix this.
  • We need to take this code once we ensure that it works as intended, and organize it so that we can treat all query types in a generic way.

ShaneMill1 avatar Nov 17 '21 01:11 ShaneMill1

FYI auto-registration in base EDR plugin added in #814.

At this point EDR plugins need only to implement EDRBase (init) and decorated query types.

Please test/validate and we can update accordingly.

tomkralidis avatar Nov 17 '21 18:11 tomkralidis

This Issue has been inactive for 90 days. As per RFC4, in order to manage maintenance burden, it will be automatically closed in 7 days.

github-actions[bot] avatar Mar 24 '24 03:03 github-actions[bot]

As per RFC4, this Issue has been closed due to there being no activity for more than 90 days.

github-actions[bot] avatar Apr 07 '24 03:04 github-actions[bot]