stac-fastapi icon indicating copy to clipboard operation
stac-fastapi copied to clipboard

Patch endpoints

Open rhysrevans3 opened this issue 1 year ago • 13 comments

Description: Adds PATCH endpoints to transaction extension. Adds support for RFC 6902 and RFC 7396. Pivots on header Content-Type value.

Related pull request: https://github.com/stac-api-extensions/transaction/pull/14

PR Checklist:

  • [x] pre-commit hooks pass locally
  • [x] Tests pass (run make test)
  • [x] Documentation has been updated to reflect changes, if applicable, and docs build successfully (run make docs)
  • [x] Changes are added to the CHANGELOG.

rhysrevans3 avatar Aug 21 '24 07:08 rhysrevans3

I've had to switch from stac-pydantic Item/Collection to Dict to allow for partial updates. Not sure if this is the best method or if you can switch off pydantic validation. Another option would be having separate models for partial items/collections in stac-pydantic where all attributes.

rhysrevans3 avatar Aug 23 '24 13:08 rhysrevans3

I've had to switch from stac-pydantic Item/Collection to Dict to allow for partial updates. Not sure if this is the best method or if you can switch off pydantic validation. Another option would be having separate models for partial items/collections in stac-pydantic where all attributes.

Would a typedDict make more sense?

vincentsarago avatar Aug 26 '24 13:08 vincentsarago

@rhysrevans3 should we wait for https://github.com/stac-api-extensions/transaction/pull/14 to be merged/published?

vincentsarago avatar Sep 03 '24 11:09 vincentsarago

@rhysrevans3 I'm trying to get a general sense of the patch operation. It's not clear to me what are the requirements that an API should implement. By default it seems that a client will have to support a lot of things

For example, I'm looking at https://docs.ogc.org/DRAFTS/20-002.html#update_clause and I don't see reference to the move/copy operations

vincentsarago avatar Sep 03 '24 11:09 vincentsarago

My understanding is RFC 5789, which the OGC doc references, say a PATCH endpoint should except a set of actions to modify the requested resource. But it doesn't specify what these actions should be.

RFC 6902 extends this and specify a set of allowed operations (add, remove, replace, move, copy, test) which is what I have suggested in https://github.com/stac-api-extensions/transaction/pull/14.

Currently the README on the transaction extension suggests the PATCH endpoint follows RFC 7396 which accepts a partial json document which it merges with the existing document this might not be the best choice for STAC because of it's handling of null values. As mentioned in this issue https://github.com/stac-api-extensions/transaction/issues/13

rhysrevans3 avatar Sep 03 '24 14:09 rhysrevans3

Just to add to @rhysrevans3 's comments, the main issue with supporting only RFC7396 (or RFC7386) is the "magic" behaviour of setting attribute to null in combination with the elastic search backend:

  • RFC7396 treats explicit null as a command to delete that attribute (this is your "delete" patch command)
  • If you actually want to set a value to null, you can't
  • ElasticSearch on the other hand, has a different understanding of explicit null as "leave this in the document but don't index this value"

Without a fairly complicated scripting command on elastic search, it would not be possible for elastic search to return a _source document that meets the expected result of an explicit null being sent in a PATCH request.

RFC6902 gets around this by allowing you to explicitly say "that attribute there, delete it" and not "that attribute there, I'm going to tell you to set it to null but which I mean please write a script to tell elastic search to remove it from the source document and reindex".

djspstfc avatar Sep 03 '24 15:09 djspstfc

@rhysrevans3 The overall looks great, just added one comment about using stac-pydantic base model.

As I've asked couple weeks ago, should we wait for https://github.com/stac-api-extensions/transaction/pull/14 to be merged? do you know anyone who could help move this PR forward?

vincentsarago avatar Sep 17 '24 10:09 vincentsarago

@rhysrevans3 The overall looks great, just added one comment about using stac-pydantic base model.

As I've asked couple weeks ago, should we wait for stac-api-extensions/transaction#14 to be merged? do you know anyone who could help move this PR forward?

Ideally yes I think it would be better to have that PR merged first.

But my understanding is that RFC 6902 and RFC 7396 are implementations of RFC 5789, which is the current STAC/OGC specification. So this merging this pull still keeps within the current STAC specification.

But given @m-mohr comments on that pull request maybe it's best to wait for OGC's opinion.

rhysrevans3 avatar Sep 18 '24 14:09 rhysrevans3

Did someone open an issue with OGC API - Features? (If yes, where?) Otherwise you may wait forever ;)

m-mohr avatar Oct 07 '24 03:10 m-mohr

Sorry I've been at a conference/on leave I've created a new issue here

rhysrevans3 avatar Oct 14 '24 13:10 rhysrevans3

@rhysrevans3 Thanks.

Given the response in https://github.com/opengeospatial/ogcapi-features/issues/957, we may also want to open an issue in the Transaction extension itself so that we can reflect the updated to OGC API - Features in the extension, too. Would you mind opening that issue, too? (I also just returned from four weeks of travelling today.)

m-mohr avatar Oct 14 '24 14:10 m-mohr

@m-mohr There is an issue https://github.com/stac-api-extensions/transaction/issues/13 and a pull request https://github.com/stac-api-extensions/transaction/pull/14 on the transaction extension repo. :)

rhysrevans3 avatar Oct 14 '24 14:10 rhysrevans3

Oh cool, too many repos to monitor. Left my comment(s) there.

m-mohr avatar Oct 14 '24 14:10 m-mohr

@vincentsarago The transaction extension pull request is just waiting for one more review. But I wondered if we could get this one merged in first? I'm confident the changes will be accept and this doesn't break the existing transaction extension.

rhysrevans3 avatar Apr 02 '25 12:04 rhysrevans3

hi @rhysrevans3

I'm a bit concerned that if merging this PR as is it would introduce a breaking change making merge_patch_item, json_patch_item, merge_patch_collection and json_patch_collection mandatory methods in any Transaction client

Also reading the spec

The default standard for PATCH is RFC 7396. Optionally, in addition to RFC 7396, RFC 6902 can be supported with the two methods being distinguished via the Accept/Content-Type header.

This sounds to me that only RFC7396 (merge) should be implemented and RFC6902 (patch) should be optional, but because of the 4 mandatory methods this is not reflected.

I understand correctly, the current implementation is not following the transaction spec because we don't have PATCH support https://github.com/stac-utils/stac-fastapi/blob/20ae9cfaf87ed892ef3235d979892e7e24c63fc0/stac_fastapi/extensions/stac_fastapi/extensions/core/transaction.py#L151-L154

Note: IMO we should move everything related to transaction to the extension module (https://github.com/stac-utils/stac-fastapi/blob/88639230e3bd1573e23e3a646204cd85932079cd/stac_fastapi/types/stac_fastapi/types/core.py#L40C1-L621C1). But it might be easier to do it after merging this PR

vincentsarago avatar Apr 02 '25 12:04 vincentsarago

@vincentsarago you're right I had missed that. I've updated json_patch_item and json_patch_collection to raise an NotImplementedError as a default. Is that enough or does it need to be in the registration of the endpoint?

Happy to use this pull request to move everything to the transaction to the extension module if required.

rhysrevans3 avatar Apr 02 '25 13:04 rhysrevans3

I could also add a NotImplemented default to the merge-patch to not break existing implementations? I'm unsure if the Transaction extension requires a PATCH endpoint and if it requires merge-patch or if you could specify a different Content-Type. For example if you wanted to implement json-patch and not merge-patch or even a non-JSON PATCH like XML PATCH RFC 5261

rhysrevans3 avatar Apr 02 '25 13:04 rhysrevans3

I've written a conversion from merge to patch for the elasticsearch-opensearch backend if this is useful to other backends? I could move it to the core utils?

https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/blob/2b1381821ec978ea254df0c8fae87420186a8789/stac_fastapi/core/stac_fastapi/core/utilities.py#L145-L171

rhysrevans3 avatar Apr 03 '25 11:04 rhysrevans3

@rhysrevans3 yeah that would be great. It will lower the barrier for people to support both patch and merge 🙏

Maybe it could be a method on the PartialItem/PartialCollection model

☝️ while writing this I realized that they are typedDict not pydantic model 🤔

vincentsarago avatar Apr 03 '25 11:04 vincentsarago

@rhysrevans3 yeah that would be great. It will lower the barrier for people to support both patch and merge 🙏

Maybe it could be a method on the PartialItem/PartialCollection model

☝️ while writing this I realized that they are typedDict not pydantic model 🤔

Yes I copied the Item/Collection models in the same file. I think the reason for using TypeDict over Pydantic models was performance (avoid validation) I remember there was some discussion when we moved to pydantic v2 #625 which suggested that's no longer an issue? Should I switch to using Pydantic models?

rhysrevans3 avatar Apr 03 '25 11:04 rhysrevans3

@rhysrevans3 I'm sorry it's taking longer 😅

I now think (with the addition of a merge to patch method) that by default we should make both patch/merge supported by default 😅.

On think I'm still not sure is how the merging/patching is made.

I'm looking at https://github.com/rhysrevans3/stac-fastapi-elasticsearch/blob/2b1381821ec978ea254df0c8fae87420186a8789/stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py#L940-L1031 to try to understand what's going on.

We should write some pseudo code in the doc (in the docstring first) to show how the patch method will look like

    def patch_item(
        self,
        collection_id: str,
        item_id: str,
        patch: Union[stac.PartialItem, List[stac.PatchOperation]],
        **kwargs,
    ) -> Optional[Union[stac.Item, Response]]:
        """Update an item from a collection.

        Called with `PATCH /collections/{collection_id}/items/{item_id}`

        Args:
            item_id: id of the item.
            collection_id: id of the collection.
            patch: either the partial item or list of patch operations.

        Returns:
            The patched item.
        """
        # convert patch item to list of merge operations
        if isinstance(patch, PartialItem):
            patch = merge_to_operations(patch)

        # Get Original Item
        item = backend.get_item(collection_id, item_id)

        # Update Item body from merge op
        item = update_item(item, patch)

        # Push Item to the backend
        _ = backend.create_item(collection_id, item_id, item)

        return item

vincentsarago avatar Apr 03 '25 11:04 vincentsarago

Should I switch to using Pydantic models?

Yes we should use pydantic model for input validation. It's only for the output that we have optional validation

vincentsarago avatar Apr 03 '25 11:04 vincentsarago

@rhysrevans3 I'm sorry it's taking longer 😅

I now think (with the addition of a merge to patch method) that by default we should make both patch/merge supported by default 😅.

On think I'm still not sure is how the merging/patching is made.

I'm looking at https://github.com/rhysrevans3/stac-fastapi-elasticsearch/blob/2b1381821ec978ea254df0c8fae87420186a8789/stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py#L940-L1031 to try to understand what's going on.

We should write some pseudo code in the doc (in the docstring first) to show how the patch method will look like

    def patch_item(
        self,
        collection_id: str,
        item_id: str,
        patch: Union[stac.PartialItem, List[stac.PatchOperation]],
        **kwargs,
    ) -> Optional[Union[stac.Item, Response]]:
        """Update an item from a collection.

        Called with `PATCH /collections/{collection_id}/items/{item_id}`

        Args:
            item_id: id of the item.
            collection_id: id of the collection.
            patch: either the partial item or list of patch operations.

        Returns:
            The patched item.
        """
        # convert patch item to list of merge operations
        if isinstance(patch, PartialItem):
            patch = merge_to_operations(patch)

        # Get Original Item
        item = backend.get_item(collection_id, item_id)

        # Update Item body from merge op
        item = update_item(item, patch)

        # Push Item to the backend
        _ = backend.create_item(collection_id, item_id, item)

        return item

No worries I'd rather it takes a bit longer and we get it right!

For Elasticsearch you can update records using Java style scripts so for each operation type I've written an equivalent Elasticsearch script. I'm not sure if you could do the same for PostgreSQL or the other backends.

A merge is equivalent to a group of add and remove operations (null values are used to remove). so you can convert from merge to operations to Elasticsearch.

rhysrevans3 avatar Apr 03 '25 12:04 rhysrevans3

I've not tested this but this is an example of running a list of patch operations on an item:

from stac_fastapi.types.stac import PatchOperation
from stac_pydantic import Item

def patch_item(item: Item, operations: PatchOperation) -> Item:
    item_dict = item.model_dump()

    for operation in operations:
        path_parts = operation.path.split('/')
 
       if operation.op == "test":
            test_value = item_dict.copy()
            for path_part in path_parts:
                test_value = test_value[path_part]

            assert test_value == operation.value
            continue

        if operation.op == "replace":
            nest = item_dict.copy()
            for path_part in path_parts:
                assert path_part in nest
                nest = nest[path_part]

        update = {}

        if operation.op in ["add", "copy", "replace", "move"]:
            if operation.hasattr("from_"):
                from_parts = operation.from_.split('/')

                value = item_dict.copy()
                for path_part in from_parts:
                    value = value[path_part]

            else:
                value = item.value

            update = value
            for path_part in path_parts.reverse():
                update = {path_part: update}

        if operation.op in ["remove", "move"]:
            if operation.op == "move":
                path_parts = from_parts
 
            last_part = path_parts.pop(-1)

            nest = item_dict
            for path_part in path_parts:
                nest = nest[path_part]

            del nest[last_part]

        return Item.model_validate(item_dict | update)

rhysrevans3 avatar Apr 04 '25 09:04 rhysrevans3

FYI: sorry for letting this PR stale. Because it adds breaking changes I'll wait for the next major release to be planned before merging.

vincentsarago avatar Apr 18 '25 08:04 vincentsarago

@vincentsarago thanks for staying on top of this. Including it in the next major release sounds sensible 😃

rhysrevans3 avatar Apr 22 '25 07:04 rhysrevans3

alright it took more than a year but we can finally merge this :-)

vincentsarago avatar Jun 02 '25 13:06 vincentsarago

FYI I cannot merge this as is because there is an issue with the pydantic model and openapi schema.

I'll try to fix this 🙏

vincentsarago avatar Jun 03 '25 16:06 vincentsarago