Proposal: Add support for JSONPath where JSON Pointer is supported
Specifically in situations documented at https://spec.openapis.org/arazzo/v1.0.1.html#examples table.
By JSONPath, I specifically refer to RFC 9535. There is no semantic overlap between JSONPath and JSON Pointer so adding a support for JSONPath WILL NOT create ambiguity as we can clearly distinguish the grammars for both of them.
Here is a concrete example where having JSONPath support would allow describing workflows that we cannot currently describe.
Let's say we have a step called search-businesses. This step exposes the matched business on $steps.search-businesses.outputs.businesses.
Now I want to have another step called get-businesses-details which accepts list of business ids. If we would supported JSONPaths, we could do the following:
{
"name": "business_ids",
"in": "query",
"value": "$steps.search-businesses.outputs.businesses#$[*].id"
}
With current capabilities it's not possible to chain steps like these, unless we introduce a surrogate transformation step into the workflow.
I can challenge this proposal and issue a PR to the spec. We have two options how to proceed forward.
Backward compatible
We can introduce new separator for JSONPath:
$steps.search-businesses.outputs.businesses@$[*].id - this clearly says that @ is followed by JSONPath and existing implementations will ignore it as they will not recognize @ after the runtime expression.
Backward incompatible
$steps.search-businesses.outputs.businesses#$[*].id - JSONPath uses the same separator (#) as JSON Pointer and because of no semantic overlap between JSONPath and JSON Pointer exists, implementation could clearly distinguish the intention. It will break the implementations that only expects JSON Pointer after # delimiter. Major release of the spec might be warranted.
I will have to defer to @frankkilcommins memory on why we went one way and not the other. I can say that my concern would be tooling. If we add support for jsonpath.. that means every tool out there will need to support BOTH or they could run in to situations where a document uses one that they dont support. Now you end up with a situation where an Arazzo description may not work with some tooling that doesnt support jsonpath. That would be unfortunate.
Hi @kevinduffey,
Thank for your reaction.
that means every tool out there will need to support BOTH or they could run in to situations where a document uses one that they dont support
Yes, that's the case. Arazzo already requires implementations to evaluate JSONPath expressions during Criterion Object evaluation. And requires to evaluate JSON Pointer when plucking parts of step outputs (among others). So we're already in that situation - every compliant implementation needs to support both.
In JSONPath we have Normalized Paths which fully replaces JSON Pointer capabilities as it can produce a single node result. On top of that we have all the capabilities of non-singular queries inside JSONPath.
Replacing the support for JSON Pointer by JSONPath might be the easiest thing in next major release as it extends the capabilities significantly while still retaining all the prior JSON Pointer capabilities.
Fair points. I did forget we have some JSON Path stuff in there. I honestly forget why we have both now, so again refer to the might @frankkilcommins as he is the foremost expert in all things Arazzo. :).
NOTE: The proposal below has been superseded. Please immediately refer to the latest proposal, and come back here if interested in the media-type-dependent support which for now has been deferred.
Having discussed this with @char0n in slack, this proposal is a starting point for how we could include additional structured selectors (e.g. JSONPath and XPath) within Arazzo's Runtime Expressions. The goal would be to include support as part of
v1.1.0release and to honour backwards compatibility.
Proposal for Selector Evaluation Extensions within Arazzo
This proposal aims to introduce an extended syntax and optional grouping model to support evaluation of structured selectors (e.g., JSONPath, XPath) following a valid Arazzo expression. It maintains backward compatibility with existing expression rules and clarifies how selector semantics are applied using known standards. The ambition is to also give authors a means to specify expressions which cater for structured data values which are influenced by content negotiation (e.g. could either be JSON or XML).
Selector Boundary Semantics
Arazzo expressions may resolve to structured data values (such as JSON or XML documents). When this occurs, a selector MAY be applied to navigate within the resolved structure. The boundary between Arazzo expression and selector expression is indicated by the
#character. All content preceding#MUST conform to the Arazzo ABNF grammar. The content after#is interpreted as a selector expression using an external selector specification.
Extended Selector Syntax
Arazzo SHALL support the following extended selector syntax format:
<arazzo_expression>#<selector_syntax>[@<version>]:<selector_expression>
Where:
arazzo_expressionis a valid Arazzo runtime expression (e.g.,$steps.findPets.outputs.pets)selector_syntaxis one of the supported selector types:jsonpathxpathjsonpointer(included for completeness)@<version>is an optional version identifier:separates selector metadata from the actual selector expression
If
@<version>is omitted, the following defaults SHALL apply:
Selector Syntax Default Version jsonpathrfc9535xpath3.1jsonpointerrfc6901
Multiple Selector Expressions
To support media-type–dependent selector behaviour (e.g., content negotiation or mixed-structure responses), Arazzo SHALL support expressions to be set per response payload media-type or media-type range.
Example of current support:
outputs: tokenExpires: $response.header.X-Expires-After rateLimit: $response.header.X-Rate-Limit sessionToken: $response.body#/pointer
By extending how the Output Object is defined, we can extend the ability for media-type awareness in a backwards compatible manner. This would involve enhancing the
outputsfield type will change fromMap[string, {expression}]toMap[string, {expression} | Output Object]where Output Object is basically defined as mapping of response media type / range to runtime expression.
It would be recommended to rename
Criterion Expression Type ObjecttoExpression Type Object, and extend clarity on the full default behaviours if omitted.
That would allow and author to define outputs with media-type awareness as follows:
outputs: tokenExpires: $response.header.X-Expires-After rateLimit: $response.header.X-Rate-Limit sessionToken: application/json: expression: $response.body#/pointer type: [string | Expression Type Object] application/xml: expression: $response.body#/pointer type: [string | Expression Type Object]
Compatibility Notes
This proposal recomments that for
v1.1.0the Arazzo Specification should state that implementations MUST supportjsonpointer@rfc6901,jsonpath@rfc9535and[email protected]and MAY support other versions if they choose to guarantee backwards compatibility.
Feature Status #as JSON PointerFully supported (legacy behavior) #<selector>:...Newly supported @versionNewly support and Optional No selector after #Treated as JSON Pointer (e.g., $response.body#/id)
Legacy expressions with JSON Pointer (e.g.,
$response.body#/id) continue to be valid. Tools MAY infer selector syntax by inspecting the leading character:
/: JSON Pointer$: JSONPath/...(in XML context) : XPath (optional fallback)
Parser Guidance
Tooling MUST:
- Parse expressions before
#as per existing ABNF- Parse selector metadata (syntax, version) from the
#...:segment- Delegate evaluation of selector expressions to an appropriate engine based on:
- Declared syntax (
jsonpath,xpath)- Version (if specified)
- Resolved content-type (if needed via the new Output Object)
Tooling MAY:
- Reject unsupported selector syntaxes
Criterion Expression Type Object
Ouch, I just noticed that the Arazzo spec has it for the first time. I wander if this should be dropped altogether. There is no chance tooling would support it as it means multiple versions of engines had to be installed. What was the reason to add it?
This would simplify the syntax by not having @<version> thing.
To support media-type–dependent selector behaviour
What is the use-case for that? It adds a lot of extra complexity with no clear reason. I understand that it improves reusability but complexity price is not worth it in my opinion.
Hi @RomanHotsiy,
Yes, I'm of the same opinion, but it's already there now and if we want a backward compatible change, we'll have to go for a clarification using [BCP 14][RFC2119][RFC8174] to clarify that implementation MUST implement jsonpath@rfc9535, [email protected] and jsonpointer@rfc6901, but it MAY implement other (arbitrary?/enumerated?) versions. This would get us where we want to be IMHO. And then we can possibly drop in next major backward incompatible release.
More in: https://github.com/OAI/Arazzo-Specification/pull/374#issuecomment-3414525168.
Hey @RomanHotsiy,
To support media-type–dependent selector behaviour
What is the use-case for that? It adds a lot of extra complexity with no clear reason. I understand that it improves reusability but complexity price is not worth it in my opinion.
Many APIs support multiple response Content-Types, and clients can use content negotiation (typically via the Accept header) to request a specific response format.
For Arazzo to be compatible with APIs that follow standard HTTP content negotiation, it should provide a mechanism to express expressions that are valid for a particular Content-Type either based on the format selected by the client or returned by the server.
For Arazzo to be compatible with APIs that follow standard HTTP content negotiation, it should provide a mechanism to express expressions that are valid for a particular Content-Type either based on the format selected by the client or returned by the server.
Can server return random content-type or am I missing something? If we describe a step with Accept: application/json the server is supposed to return the data in json format and if it doesn't there is no point to continue execution.
Can server return random content-type or am I missing something? If we describe a step with Accept: application/json the server is supposed to return the data in json format and if it doesn't there is no point to continue execution.
Totally fair. If the step sets Accept: application/json, the server is expected to comply, and it’s valid to fail otherwise.
But the point of media-type–dependent expressions is about design-time flexibility and letting a single workflow support both application/json and application/xml responses when APIs support content negotiation. That way, the workflow can adapt based on what the client prefers or what the server provides, without duplicating logic or baking in a single format. It's something that was slightly overlooking in the current verion.
But the point of media-type–dependent expressions is about design-time flexibility and letting a single workflow support both application/json and application/xml responses when APIs support content negotiation. That way, the workflow can adapt based on what the client prefers or what the server provides, without duplicating logic or baking in a single format. It's something that was slightly overlooking in the current verion.
Yes, so as I mentioned about I agree that it "improves reusability". But is it really such a common use case/pain point? I would suggest evolving the spec based on the real use cases and real user feedback or the spec risks to become too complex for any tooling to support all the features.
XML remains common in certain industries, and with AI engaging with APIs across industries, this becomes increasingly important for technologies like Arazzo.
JSON-oriented products might not see XML in their most common use cases, but that doesn't mean that it's not used or that an XML-based API won't show up in a workflow.
@RomanHotsiy I do agree that we should design based on use cases, but the weird partitioning of the JSON and XML spaces makes each somewhat blind to the other's use cases. We've seen some excitement around the improved XML support in OAS v3.2. Certainly not as much as some other features! But there are people who have been asking for better XML support for many years and are happy we finally listened.
There are real XML use cases out there, and in informal conversations with AI folks at a recent gathering, it came up that AI is having to deal with XML APIs that are entrenched, and this is particularly relevant to worflow description where you might need to talk to APIs with different representation formats.
Regarding real use cases, I'll also point to https://github.com/OAI/OpenAPI-Specification/issues/2146 requesting a selector syntax for multipart responses. I have been pleasantly surprised at the excitement around expanded multipart support in 3.2. multipart (_other than multipart/form-data) turns out to be used a lot, for example to combine a JSON metadata blob with a binary payload, and that usage does not seem to be declining. If we're considering selector options, something simple to indicate which named or numbered part (prior to applying the selector to the part based on the part's media type) would be worth considering. Even if it is not implemented in 1.1. It would be good to allow for future expansion.
A similar and extremely important use case is selectors for streaming payloads, which seem to be ubiquitous in the API payload world. We had someone show up with a multipart/mixed + application/json implementation of streaming JSON, plus the more common application/jsonl, application/json-seq, etc. For an example outside of AI, geospatial systems make heavy use of application/geo+json-seq (combination of application/json-seq and application/geo+json).
That said, I can't quite tell (still on 1st cup of tea) whether this proposal restricts selector syntax to appropriate media types. Tools should not be expected to figure out how to apply XML selectors to JSON or vice versa. Only a selector syntax designed to work with both should be required to be supported for both.
JSON-oriented products might not see XML in their most common use cases, but that doesn't mean that it's not used or that an XML-based API won't show up in a workflow.
I fully agree and I never said XML is not important. What I'm saying is that XML is already supported by Arazzo without adding any content-aware negotiation logic. The worst case is someone would need to duplicate a workflow but I believe there are very very few use cases like this so we optimize the spec to cover 0.5% of use cases and making life of 100% of author tooling harder. I may be wrong but I haven't seen any evidence.
Regarding real use cases, I'll also point to https://github.com/OAI/OpenAPI-Specification/issues/2146 requesting a selector syntax for multipart responses. I have been pleasantly surprised at the excitement around expanded multipart support in 3.2.
I think there is a misunderstanding. I am not opposing any new selector syntax. I think it makes sense. What I think we should not do just yet is media-type-aware outputs:
That would allow and author to define outputs with media-type awareness as follows
@RomanHotsiy thanks for taking the time to elaborate! I'll spend a little more time with this proposal before replying further.
@RomanHotsiy we discussed the feedback in this week's Arazzo call, and the originally proposed support for media-type–dependent selector behaviour (e.g., selecting expressions based on negotiated Content-Type) will be deferred. Authors who wish to support different response formats (e.g., JSON vs. XML) should express this using separate workflows, each configured with a specific Accept header and content-type expectation.
I will provide a new version of the proposal and then move onto preparing a PR.
Proposal for Selector Evaluation Extensions within Arazzo
This proposal introduces an extended syntax to support evaluation of structured selectors (e.g., JSONPath, XPath) within valid Arazzo expressions. The goal is to improve expressiveness and enable precise traversal of structured data values such as JSON and XML, while remaining fully backward-compatible with Arazzo 1.0.x.
Selector Boundary Semantics
Arazzo expressions may resolve to structured data values such as JSON or XML. When this occurs, a selector MAY be applied to further navigate the structure. The boundary between the Arazzo expression and the external selector is indicated by the # character.
- The portion before
#MUST conform to the Arazzo ABNF expression syntax. - The portion after
#is interpreted as a selector expression, using a known external standard.
Extended Selector Syntax
Arazzo SHALL support the following extended selector syntax format:
<arazzo_expression>#<selector_syntax>[@<version>]:<selector_expression>
| Selector Segment | Description |
|---|---|
arazzo_expression |
Any valid Arazzo runtime expression (e.g., $response.body) |
selector_syntax |
The selector type: jsonpath, xpath, or jsonpointer |
@<version> (optional) |
Version identifier of the selector specification |
: |
Separator between selector metadata and the actual selector expression |
selector_expression |
The structured selector (e.g., $.foo[0], /Invoice/Amount) |
Examples
# JSONPath using RFC 9535 (default)
$response.body#jsonpath:$.items[0].price
# XPath using version 3.1 (default)
$response.body#xpath:/Envelope/Body/Item[1]/Total
# JSONPath using explicit version
$response.body#jsonpath@draft-goessner-dispatch-jsonpath-00:$.legacyField
# JSON Pointer (default)
$response.body#/order/id
Selector Version Support
To ensure portability and consistency, Arazzo defines allowed selector versions as follows:
| Selector Syntax | Default Version | Allowed Versions |
|---|---|---|
jsonpath |
rfc9535 |
rfc9535, raft-goessner-dispatch-jsonpath-00 |
xpath |
3.1 |
3.1, 3.0, 2.0, 1.0 |
jsonpointer |
rfc6901 |
rfc6901 |
Compatibility Notes
This proposal recomments that for v1.1.0 the Arazzo Specification should state that implementations MUST support jsonpointer@rfc6901 , jsonpath@rfc9535 and [email protected], and MAY support other versions if they choose to guarantee backwards compatibility.
| Feature | Status |
|---|---|
# as JSON Pointer |
Fully supported (legacy behavior) |
#<selector>:... |
Newly supported |
@version |
Newly support and Optional |
No selector after # |
Treated as JSON Pointer (e.g., $response.body#/id) |
Legacy expressions with JSON Pointer (e.g., $response.body#/id) continue to be valid. Tooling MAY infer selector syntax by inspecting the leading character:
-
/: JSON Pointer -
$: JSONPath -
/...(in XML context) : XPath (optional fallback)
Tooling Guidance
Implementations of Arazzo 1.1.0 MUST support selectors conforming to:
- JSONPath as per RFC 9535
- XPath 3.1 per W3C XPath 3.1
- JSON Pointer per RFC 6901
Tooling MUST:
- Parse any content before # using the Arazzo ABNF expression grammar
- Parse and validate selector metadata (syntax and version) from
#<syntax>[@version]:... - Route selector expressions to an appropriate evaluation engine based on declared syntax and version
Tooling MAY:
- Support additional (non-normative) versions for internal or compatibility purposes
- Reject unknown selector types or unsupported versions
- Defer evaluation errors (e.g., invalid selector for the actual resolved value) to runtime
@frankkilcommins Could you please clarify what would be the benefit of using this new syntax over the previous variant?
Suggested:
$response.body#xpath:/Envelope/Body/Item[1]/Total
Current:
- context: $response.body
condition: /Envelope/Body/Item[1]/Total
type: xpath
Do you have any example that can't be described with the current version? At first glance it is harder to read comparing to existing syntax. Also having two ways to describe the same thing would be confusing.
Hi @DmitryAnansky
Thanks for raising this.
The initial intention behind the proposal was to enhance areas where runtime expressions as value selectors/setters (not condition checks), such as in:
- outputs
- parameters
- requestBody
I do however agree that having similar but different methods across the different flavours is not ideal. And I'm probably leaning towards my own original thoughts on this which is that we can have a similar shared model for both contexts and that would also be easier to support and evolve into the future.
To do this, we could have:
Expression Type Object
This replaces the previously named Criterion Expression Type Object and will be reused for both:
- Criterion Object
- Selector Object
Allowed type/version values remain:
| Type | Allowed Versions | Default |
|---|---|---|
jsonpath |
rfc9535, draft-goessner-dispatch-jsonpath-00 |
rfc9535 |
xpath |
3.1, 3.0, 2.0, 1.0 |
3.1 |
jsonpointer |
rfc6901 Just adding for completeness |
rfc6901 |
Selector Object
Introduce a Selector Object, that can be used wherever more complex expressions than what's covered by the Arazzo ABNF grammer against structured data are required.
expression: string # A valid Arazzo expression (e.g., $response.body)
selector: string # A selector expression (e.g., $.items[0].id, /Envelope/Item)
type: string | Expression Type Object
Examples of usage:
outputs:
userEmail:
expression: $response.body
selector: $.user.profile.email
type: jsonpath
The example above would be equvilant the following inline expression (if we took the earlier proposal):
userEmail: $response.body#jsonpath:$.user.profile.email
RequestBody example:
requestBody:
contentType: application/json
payload:
invoiceId:
expression: $steps.fetchXml.outputs.invoiceXml
selector: /Invoice/Header/InvoiceNumber
type:
type: xpath
version: 3.1
The example above would be equvilant the following inline expression (if we took the earlier proposal):
requestBody:
contentType: application/xml
payload: |
<invoice>
<id>{$steps.fetchXml.outputs.invoiceXml#xpath:/Invoice/Header/InvoiceNumber}</id>
</invoice>