openeo-python-client Phase out read

when passing a string as polygon argument to ImageCollectionClient.mask: https://github.com/Open-EO/openeo-python-client/blob/b5f3b4725ed54a7ab1522992b330221c07f6f287/openeo/rest/imagecollectionclient.py#L777-L784

or in _get_geometry_argument: https://github.com/Open-EO/openeo-python-client/blob/2ce9d3be823ff2201e55d2e96e82811714164e9a/openeo/rest/datacube.py#L900

this implementation assumes process read_vector which is currently a VITO specific process and I'm not aware of anything alike in the official process collection.

Background: At VITO we currently use this to work with very large polygon files (> 100k polygons) stored at backend side, which we don't want to pass with the openEO request for obvious reasons.

In the client we should avoid hardcoding non-official processes of course

Solution:

check which processes are supported by backend: read_vector, load_url, load_geojson
check if the string is an http url, a path to geojson that exists locally, or a path that does not exist locally
depending on the combination of available processes and the type of string, do something that's more sensible than current implementation
Some care should be taken to avoid passing a huge geojson object with the request (e.g. throw an exception when above a threshold)
Still allow the possibility to inject a read_vector based argument to support the VITO use cases

cc @jdries

Dec 17 '19 10:12 soxofaan

A fix for this is being proposed in this pull request: https://github.com/Open-EO/openeo-processes/pull/106

Dec 17 '19 10:12 jdries

ah nice, I somehow missed that thread

Dec 17 '19 10:12 soxofaan

however, load_uploaded_files is meant for user uploaded files, which is not exactly what we currently do in the VITO use cases that depend on read_vector.

A closer proposal is probably the import_nfs process from Open-EO/openeo-processes#105

Dec 17 '19 11:12 soxofaan

way forward:

https://github.com/Open-EO/openeo-python-client/issues/457

Jul 18 '24 12:07 soxofaan

Another use case that has to be updated: this suggestion was made in a user support channel to do aggregate_spatial with a (large) geometry from a URL:

datacube = datacube.aggregate_spatial(
    geometries="https://example.com/path/to/geometries.json",
    reducer="mean",
)

will currently produce a process graph using read_vector, so that's not future proof at the moment

(cc @EmileSonneveld)

Oct 24 '24 09:10 soxofaan

done:

remove read_vector usage from default geometry handling (but documented how to reconstruct for workflows that still need it)
add support for passing a geometry URL

Nov 27 '24 15:11 soxofaan

Phase out read_vector usage