specs Returning errors for individual queries

Sometimes, reconciliation queries can be invalid, for all sorts of reasons:

their JSON structure does not fit the schema
they contain references to objects that do not exist (for instance, invalid property or type id)
some text fields could be too long?
the service could have a temporary failure in resolving that particular query

Because queries are generally sent by batches, we currently do not have a good way to return an error in those cases. The service can decide to return a HTTP 401 error (for instance) for the whole batch, but that is not so useful because the client does not know which of the queries caused the error, and perhaps it could also have made use of the reconciliation results for the non-failing queries in the same batch.

So we should devise a way to expose errors for individual queries. It should mostly be about finding a JSON syntax for it and specifying it in https://reconciliation-api.github.io/specs/latest/#reconciliation-query-responses

Originally brought up here: https://github.com/wetneb/openrefine-wikibase/issues/116

Jun 03 '21 06:06 wetneb

Thanks for bringing this up @wetneb !

I suggest basing the JSON return format on the RFC7807 application/problem+json specification. The simplest possible example is a JSON object with the HTTP error code and a human-readable title:

{
  "title": "Not Found",
  "status": 404
}

There can also be a detail field with more information such as "Requested resource could not be found". There are other predefined fields such as type and instance.

It is also possible to add custom fields. For the batch case, it could be useful to add a field whose value is an array of the individual errors within the batch (perhaps those could also be expressed as Problem JSON objects - a bit of recursion never hurts, eh?).

I've used Problem JSON in the Annif REST API, but only in a very simple form.

There is also zalando/problem, a Java library for handling Problem JSON objects which may be useful for some implementations.

Jun 03 '21 07:06 osma

OK, so with this solution, if one query fails, we would not return query results for any of the other queries, right? I was initially thinking of a solution where you would be able to be more granular (return reconciliation results for the successful queries and errors for the unsuccessful ones), but perhaps that is not so clean and standard…

For instance, see ElasticSearch's bulk API, which supports that: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

Jun 03 '21 07:06 wetneb

Ah, that makes sense. Then perhaps the response for the whole batch could contain both results and errors like you say, and the errors could be represented as Problem JSON objects - or if that's not possible for some reason, then at least something very similar?

Jun 03 '21 08:06 osma

We had a very similar discussion just yesterday: our application metaphactory supports reconciliation from local graph database(s) but also federated reconciliation from multiple sources, e.g. a local graph database and Wikidata. The big question is how to handle errors in some members: would one rather fail the entire lookup operation or return partial results without forwarding the information about smoe errors to the user? Having some means to send partial results AND the information about some errors / warnings would be great!

Jun 03 '21 12:06 jetztgradnet

I think borrowing or using OpenAPI can be useful here. With OpenAPI a default response is how you describe errors collectively, not individually. Also with OpenAPI, there's a description of an empty body response, like 204 No Content.

https://swagger.io/docs/specification/describing-responses/
https://swagger.io/specification/#responses-object

But in link 1. above I think $ref at the operation level might help overall. Where you can say there are error responses all with the same status code and response body. Imagine that 50% of the queries are OK with status code 200, but the other 50% of queries failed (As an example, they all gave an extra parameter for some reason that was not understood by the service).

Jun 03 '21 15:06 thadguidry

I know it's underspecified in the spec, but what do any of the servers do now - if one of the queries in a batch are invalid, like maybe 'q1' doesn't include a 'query' key, do servers just fail the whole batch?

Jun 04 '21 02:06 epaulson

For Wikidata it depends on the sort of error - some will fail the whole batch, some will return an empty list of results for the failing query.

Jun 04 '21 06:06 wetneb