Patch endpoints
Description:
Adds PATCH endpoints to transaction extension. Adds support for RFC 6902 and RFC 7396. Pivots on header Content-Type value.
Related pull request: https://github.com/stac-api-extensions/transaction/pull/14
PR Checklist:
- [x]
pre-commithooks pass locally - [x] Tests pass (run
make test) - [x] Documentation has been updated to reflect changes, if applicable, and docs build successfully (run
make docs) - [x] Changes are added to the CHANGELOG.
I've had to switch from stac-pydantic Item/Collection to Dict to allow for partial updates. Not sure if this is the best method or if you can switch off pydantic validation. Another option would be having separate models for partial items/collections in stac-pydantic where all attributes.
I've had to switch from stac-pydantic Item/Collection to Dict to allow for partial updates. Not sure if this is the best method or if you can switch off pydantic validation. Another option would be having separate models for partial items/collections in stac-pydantic where all attributes.
Would a typedDict make more sense?
@rhysrevans3 should we wait for https://github.com/stac-api-extensions/transaction/pull/14 to be merged/published?
@rhysrevans3 I'm trying to get a general sense of the patch operation. It's not clear to me what are the requirements that an API should implement. By default it seems that a client will have to support a lot of things
For example, I'm looking at https://docs.ogc.org/DRAFTS/20-002.html#update_clause and I don't see reference to the move/copy operations
My understanding is RFC 5789, which the OGC doc references, say a PATCH endpoint should except a set of actions to modify the requested resource. But it doesn't specify what these actions should be.
RFC 6902 extends this and specify a set of allowed operations (add, remove, replace, move, copy, test) which is what I have suggested in https://github.com/stac-api-extensions/transaction/pull/14.
Currently the README on the transaction extension suggests the PATCH endpoint follows RFC 7396 which accepts a partial json document which it merges with the existing document this might not be the best choice for STAC because of it's handling of null values. As mentioned in this issue https://github.com/stac-api-extensions/transaction/issues/13
Just to add to @rhysrevans3 's comments, the main issue with supporting only RFC7396 (or RFC7386) is the "magic" behaviour of setting attribute to null in combination with the elastic search backend:
- RFC7396 treats explicit
nullas a command to delete that attribute (this is your "delete" patch command) - If you actually want to set a value to
null, you can't - ElasticSearch on the other hand, has a different understanding of explicit
nullas "leave this in the document but don't index this value"
Without a fairly complicated scripting command on elastic search, it would not be possible for elastic search to return a _source document that meets the expected result of an explicit null being sent in a PATCH request.
RFC6902 gets around this by allowing you to explicitly say "that attribute there, delete it" and not "that attribute there, I'm going to tell you to set it to null but which I mean please write a script to tell elastic search to remove it from the source document and reindex".
@rhysrevans3 The overall looks great, just added one comment about using stac-pydantic base model.
As I've asked couple weeks ago, should we wait for https://github.com/stac-api-extensions/transaction/pull/14 to be merged? do you know anyone who could help move this PR forward?
@rhysrevans3 The overall looks great, just added one comment about using stac-pydantic base model.
As I've asked couple weeks ago, should we wait for stac-api-extensions/transaction#14 to be merged? do you know anyone who could help move this PR forward?
Ideally yes I think it would be better to have that PR merged first.
But my understanding is that RFC 6902 and RFC 7396 are implementations of RFC 5789, which is the current STAC/OGC specification. So this merging this pull still keeps within the current STAC specification.
But given @m-mohr comments on that pull request maybe it's best to wait for OGC's opinion.
Did someone open an issue with OGC API - Features? (If yes, where?) Otherwise you may wait forever ;)
Sorry I've been at a conference/on leave I've created a new issue here
@rhysrevans3 Thanks.
Given the response in https://github.com/opengeospatial/ogcapi-features/issues/957, we may also want to open an issue in the Transaction extension itself so that we can reflect the updated to OGC API - Features in the extension, too. Would you mind opening that issue, too? (I also just returned from four weeks of travelling today.)
@m-mohr There is an issue https://github.com/stac-api-extensions/transaction/issues/13 and a pull request https://github.com/stac-api-extensions/transaction/pull/14 on the transaction extension repo. :)
Oh cool, too many repos to monitor. Left my comment(s) there.
@vincentsarago The transaction extension pull request is just waiting for one more review. But I wondered if we could get this one merged in first? I'm confident the changes will be accept and this doesn't break the existing transaction extension.
hi @rhysrevans3
I'm a bit concerned that if merging this PR as is it would introduce a breaking change making merge_patch_item, json_patch_item, merge_patch_collection and json_patch_collection mandatory methods in any Transaction client
Also reading the spec
The default standard for PATCH is RFC 7396. Optionally, in addition to RFC 7396, RFC 6902 can be supported with the two methods being distinguished via the Accept/Content-Type header.
This sounds to me that only RFC7396 (merge) should be implemented and RFC6902 (patch) should be optional, but because of the 4 mandatory methods this is not reflected.
I understand correctly, the current implementation is not following the transaction spec because we don't have PATCH support https://github.com/stac-utils/stac-fastapi/blob/20ae9cfaf87ed892ef3235d979892e7e24c63fc0/stac_fastapi/extensions/stac_fastapi/extensions/core/transaction.py#L151-L154
Note: IMO we should move everything related to transaction to the extension module (https://github.com/stac-utils/stac-fastapi/blob/88639230e3bd1573e23e3a646204cd85932079cd/stac_fastapi/types/stac_fastapi/types/core.py#L40C1-L621C1). But it might be easier to do it after merging this PR
@vincentsarago you're right I had missed that. I've updated json_patch_item and json_patch_collection to raise an NotImplementedError as a default. Is that enough or does it need to be in the registration of the endpoint?
Happy to use this pull request to move everything to the transaction to the extension module if required.
I could also add a NotImplemented default to the merge-patch to not break existing implementations? I'm unsure if the Transaction extension requires a PATCH endpoint and if it requires merge-patch or if you could specify a different Content-Type. For example if you wanted to implement json-patch and not merge-patch or even a non-JSON PATCH like XML PATCH RFC 5261
I've written a conversion from merge to patch for the elasticsearch-opensearch backend if this is useful to other backends? I could move it to the core utils?
https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/blob/2b1381821ec978ea254df0c8fae87420186a8789/stac_fastapi/core/stac_fastapi/core/utilities.py#L145-L171
@rhysrevans3 yeah that would be great. It will lower the barrier for people to support both patch and merge 🙏
Maybe it could be a method on the PartialItem/PartialCollection model
☝️ while writing this I realized that they are typedDict not pydantic model 🤔
@rhysrevans3 yeah that would be great. It will lower the barrier for people to support both patch and merge 🙏
Maybe it could be a method on the PartialItem/PartialCollection model
☝️ while writing this I realized that they are typedDict not pydantic model 🤔
Yes I copied the Item/Collection models in the same file. I think the reason for using TypeDict over Pydantic models was performance (avoid validation) I remember there was some discussion when we moved to pydantic v2 #625 which suggested that's no longer an issue? Should I switch to using Pydantic models?
@rhysrevans3 I'm sorry it's taking longer 😅
I now think (with the addition of a merge to patch method) that by default we should make both patch/merge supported by default 😅.
On think I'm still not sure is how the merging/patching is made.
I'm looking at https://github.com/rhysrevans3/stac-fastapi-elasticsearch/blob/2b1381821ec978ea254df0c8fae87420186a8789/stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py#L940-L1031 to try to understand what's going on.
We should write some pseudo code in the doc (in the docstring first) to show how the patch method will look like
def patch_item(
self,
collection_id: str,
item_id: str,
patch: Union[stac.PartialItem, List[stac.PatchOperation]],
**kwargs,
) -> Optional[Union[stac.Item, Response]]:
"""Update an item from a collection.
Called with `PATCH /collections/{collection_id}/items/{item_id}`
Args:
item_id: id of the item.
collection_id: id of the collection.
patch: either the partial item or list of patch operations.
Returns:
The patched item.
"""
# convert patch item to list of merge operations
if isinstance(patch, PartialItem):
patch = merge_to_operations(patch)
# Get Original Item
item = backend.get_item(collection_id, item_id)
# Update Item body from merge op
item = update_item(item, patch)
# Push Item to the backend
_ = backend.create_item(collection_id, item_id, item)
return item
Should I switch to using Pydantic models?
Yes we should use pydantic model for input validation. It's only for the output that we have optional validation
@rhysrevans3 I'm sorry it's taking longer 😅
I now think (with the addition of a
merge to patchmethod) that by default we should make both patch/merge supported by default 😅.On think I'm still not sure is how the merging/patching is made.
I'm looking at https://github.com/rhysrevans3/stac-fastapi-elasticsearch/blob/2b1381821ec978ea254df0c8fae87420186a8789/stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py#L940-L1031 to try to understand what's going on.
We should write some pseudo code in the doc (in the docstring first) to show how the
patchmethod will look likedef patch_item( self, collection_id: str, item_id: str, patch: Union[stac.PartialItem, List[stac.PatchOperation]], **kwargs, ) -> Optional[Union[stac.Item, Response]]: """Update an item from a collection. Called with `PATCH /collections/{collection_id}/items/{item_id}` Args: item_id: id of the item. collection_id: id of the collection. patch: either the partial item or list of patch operations. Returns: The patched item. """ # convert patch item to list of merge operations if isinstance(patch, PartialItem): patch = merge_to_operations(patch) # Get Original Item item = backend.get_item(collection_id, item_id) # Update Item body from merge op item = update_item(item, patch) # Push Item to the backend _ = backend.create_item(collection_id, item_id, item) return item
No worries I'd rather it takes a bit longer and we get it right!
For Elasticsearch you can update records using Java style scripts so for each operation type I've written an equivalent Elasticsearch script. I'm not sure if you could do the same for PostgreSQL or the other backends.
A merge is equivalent to a group of add and remove operations (null values are used to remove). so you can convert from merge to operations to Elasticsearch.
I've not tested this but this is an example of running a list of patch operations on an item:
from stac_fastapi.types.stac import PatchOperation
from stac_pydantic import Item
def patch_item(item: Item, operations: PatchOperation) -> Item:
item_dict = item.model_dump()
for operation in operations:
path_parts = operation.path.split('/')
if operation.op == "test":
test_value = item_dict.copy()
for path_part in path_parts:
test_value = test_value[path_part]
assert test_value == operation.value
continue
if operation.op == "replace":
nest = item_dict.copy()
for path_part in path_parts:
assert path_part in nest
nest = nest[path_part]
update = {}
if operation.op in ["add", "copy", "replace", "move"]:
if operation.hasattr("from_"):
from_parts = operation.from_.split('/')
value = item_dict.copy()
for path_part in from_parts:
value = value[path_part]
else:
value = item.value
update = value
for path_part in path_parts.reverse():
update = {path_part: update}
if operation.op in ["remove", "move"]:
if operation.op == "move":
path_parts = from_parts
last_part = path_parts.pop(-1)
nest = item_dict
for path_part in path_parts:
nest = nest[path_part]
del nest[last_part]
return Item.model_validate(item_dict | update)
FYI: sorry for letting this PR stale. Because it adds breaking changes I'll wait for the next major release to be planned before merging.
@vincentsarago thanks for staying on top of this. Including it in the next major release sounds sensible 😃
alright it took more than a year but we can finally merge this :-)
FYI I cannot merge this as is because there is an issue with the pydantic model and openapi schema.
I'll try to fix this 🙏