Parameter 'outputs' from request object is not propagated to the manager execute_process and then to processor execute
In API.execute_process(), the (optional) parameter 'outputs', if present in the request, is not passed on to BaseManager.execute_process() (or derived). Even more, the parameter 'outputs' is not collected: only the parameter 'inputs' is collected.
To properly process a request the processor (i.e. "BaseProcessor" or derived) should get the requested outputs to know which outputs to produce. I expect the manager (e.g. "BaseManager" or derived) to use this information, and also the property "transmitionMode" of each output, to prepare:
- body;
- headers (i.e. 'Location');
- content pointed by 'Location' in the headers.
In case of "transmissionMode": "reference", I am not sure if either the processor or the manager should prepare the reference (possibly writing to a file the single output). As far as I understand, the reference is not expected to be served by the API, therefore only the processor can prepare it (possibly on a different endpoint). If the reference could be server by the API (but I do not understand the corresponding entrypoint), then the manager could be in charge of it.
I propose to add an additonal parameter (a dictionary with the content of outputs) to BaseManager.execute_process() and the same additional parameter to BaseProcessor.execute()
Hi @FrancescoIngv, few comments on the formal aspects of the ticket:
- It's still not clear to me if we should consider the ticket as a
feature requestor abug reporteven if you have flagged it as enhancement. - Can you kindly assess with the OGC API - Processes specification if the current pygeoapi interface of the OpenAPI document doesn't comply with the standard? (i.e.
outputsis not taken into account) If it doesn't then it's clearly a bug otherwise it would be a feature to be added - Regardless of the outcome I would strongly encourage you to edit your explanation above and adapt its content under one of the relevant template that the team has prepared for the reporters. This would help us to understand the problem and review properly any eventual pull request that would fix the issue.
Thanks, Francesco
Description
The processor cannot return outputs based on the outputs parameter of a process-execute
Steps to Reproduce
Create a processor (say TestProcessor) derived from BaseProcessor with the following outputs section in PROCESS_METADATA
"outputs": {
"echo1": {
"schema": {
"type": "string"
},
"echo2": {
"schema": {
"type": "string"
}
}
},
It is impossible to make the function TestProcessor.execute() returning a single specific output as requested by the user.
Expected behavior Note: The problem is not present if the processor defines a single output with a single format.
If process-execute is requested without outputs parameter, all defined outputs (i.e. echo1 and echo2) will be returned (requirement 27 of the standard - however the standard does not specify the transmissionMode to return the outputs in this case).
If process-execute is requested with outputs parameter, and the outputs parameter contains only echo2, only echo2 will be returned (I did not find this clearly stated in the standard).
The specification also spells:
A process output can be defined in the process description as being of one or more media
types. In cases where a specific output can be presented in one of a number of media types, the
format parameter in the execute request can be used to indicate the format that should be used
to present the process output value in the server’s response.
This is far more difficult to describe in Steps to Reproduce and then in Expected behavior, but to me it requires the TestProcessor can access the full outputs parameter.
Environment
- OS: Ubuntu
- Python version: 3.8.10
- pygeoapi version: 0.15.0
Additional context
I propose to add an additonal parameter (a dictionary with the content of outputs) to BaseManager.execute_process() and the same additional parameter to BaseProcessor.execute().
To keep backward compatibility the parameter could be a named parameter.
This is a subset of #1285, which deals more generally with what is missing from pygeoapi in terms of fully supporting clause 7.11 (execute a process) of the OGC API - Processes v1.0.0 spec.
TL; DR To be clear, if a PR with this functionality would be merged, there would likely be a breaking change to pygeoapi's API and this would require a new release and a potential refactor of downstream projects and deployments.
In my opinion it would be difficult to solve this problem while still ensuring backwards compatibility. The reasons being that in order to support the additional information that is being passed in the request, which is:
-
outputs -
response -
subscriber
we need to add support for this in:
-
pygeoapi.process.manager.base.BaseManager.execute_process()https://github.com/geopython/pygeoapi/blob/e7264e89bce3b14f2f013376b0c1159b9b5fa634/pygeoapi/process/manager/base.py#L311
This method's signature would need to change. Additionally, since pygeoapi supports plugging in a custom process manager, whatever downstream managers might exist out there in the wild would need to be modified as well in order to cope with the change
-
pygeoapi.process.base.BaseProcessor.execute()https://github.com/geopython/pygeoapi/blob/e7264e89bce3b14f2f013376b0c1159b9b5fa634/pygeoapi/process/base.py#L51
This method's signature also needs to change. Additionally, since pygeoapi also supports plugging in custom processes, this would break whatever third party process implementations exist.
This is not to say that I do not support the change, I absolutely do, since it is required in order for pygeoapi to be compliant with the OAProc standard. But it seems to me that it implies a breaking change that will require existing deployments to modify their code. This also means that if there are any deployments out there that are living on the edge (i.e. depending on pygeoapi's master branch), these would likely break.
Adding support for parsing an execute request's outputs option would be a nice addition indeed. I also think that while we are at it, it would be optimal to up the OAProc support for full compliance with v1.0.0 of the standard and then releasing a new version of pygeoapi - perhaps even a v1.0.0?
I'm not sure what would be the most appropriate course of action in this regard, so maybe we'd need some input from @tomkralidis here?
I understand there is a backward compatibility issue,
however I believe to let the community to program the processing services according to the standard, the upgrade of the interface is required/mandatory.
Otherwise the frame over-constraints and limits the development, possibly towards bad practices (e.g. I am wondering about: one service entry point for each combination of outputs; subscriber passed on as input parameters, and the Processor taking care to act on them; transmissionMode passed as input parameter and possibly the link returned as output; etc...).
I wonder if a viable solution could be:
- for the current release, to keep the backward compatibility, but also to allow coding adhering to the standard:
- declare
BaseProcessorandBaseManager@deprecated - create
NewBaseProcessorandNewBaseManagerwith the new required signatures - change
API.execute_process()to check ifself.manageris derived fromBaseManagerorNewBaseManagerand accordingly callself.manager.execute_process()with different number of parameters
- for the next release, possibly drop the backward compatibility:
- make
BaseProcessorandBaseManageran alias ofNewBaseProcessorandNewBaseManager
We did have a breaking change in execute_process more or less recently (1-2 years ago?) though, so I'm not sure if this part of pygeoapi is considered stable enough to need a complex deprecation handling (I can't find the commit right now, but input parameter handling was changed from [{"id": "foo", "value": "bar"}] to {"foo": "bar"} or so), this was the previous implementation
https://github.com/geopython/pygeoapi/blob/60202129ec098c6f119c4b29a202603e186b82f7/pygeoapi/api.py#L2076 )
Adding more parameters to BaseProcessor.execute_process would of course be a breaking change, but it would be fairly simple for maintainers to accept the parameters and ignore them for now.
Here's also the pull request for subscriber, which would add a parameter to BaseProcessor.execute_process: https://github.com/geopython/pygeoapi/pull/1313
This Issue has been inactive for 90 days. As per RFC4, in order to manage maintenance burden, it will be automatically closed in 7 days.
The issue is still open as a bug:
the parameter outputs contains also the property transmissionMode,
which should be passed on to the ProcessManager and then to Process to properly handle the request.
Ref. also: https://github.com/geopython/pygeoapi/issues/1285
There is an approved PR https://github.com/geopython/pygeoapi/pull/1448 to solve the issue, but it was not merged.
The solution needs to change two signatures and could be applied in two steps:
-
BaseManager.execute_process()(which was also recently changed) - [dependent on the previous change]
BaseProcessor.execute()
Accordingly, the proposed PR could be split in two separate PRs for smoother integration.