Arazzo-Specification icon indicating copy to clipboard operation
Arazzo-Specification copied to clipboard

Proposal: Add support for JSONPath where JSON Pointer is supported

Open char0n opened this issue 7 months ago • 18 comments

Specifically in situations documented at https://spec.openapis.org/arazzo/v1.0.1.html#examples table.

Image

By JSONPath, I specifically refer to RFC 9535. There is no semantic overlap between JSONPath and JSON Pointer so adding a support for JSONPath WILL NOT create ambiguity as we can clearly distinguish the grammars for both of them.

Here is a concrete example where having JSONPath support would allow describing workflows that we cannot currently describe.

Let's say we have a step called search-businesses. This step exposes the matched business on $steps.search-businesses.outputs.businesses.

Now I want to have another step called get-businesses-details which accepts list of business ids. If we would supported JSONPaths, we could do the following:

{
  "name": "business_ids",
  "in": "query",
  "value": "$steps.search-businesses.outputs.businesses#$[*].id"
}

With current capabilities it's not possible to chain steps like these, unless we introduce a surrogate transformation step into the workflow.


I can challenge this proposal and issue a PR to the spec. We have two options how to proceed forward.

Backward compatible

We can introduce new separator for JSONPath:

$steps.search-businesses.outputs.businesses@$[*].id - this clearly says that @ is followed by JSONPath and existing implementations will ignore it as they will not recognize @ after the runtime expression.

Backward incompatible

$steps.search-businesses.outputs.businesses#$[*].id - JSONPath uses the same separator (#) as JSON Pointer and because of no semantic overlap between JSONPath and JSON Pointer exists, implementation could clearly distinguish the intention. It will break the implementations that only expects JSON Pointer after # delimiter. Major release of the spec might be warranted.

char0n avatar Sep 05 '25 13:09 char0n

I will have to defer to @frankkilcommins memory on why we went one way and not the other. I can say that my concern would be tooling. If we add support for jsonpath.. that means every tool out there will need to support BOTH or they could run in to situations where a document uses one that they dont support. Now you end up with a situation where an Arazzo description may not work with some tooling that doesnt support jsonpath. That would be unfortunate.

kevinduffey avatar Sep 05 '25 17:09 kevinduffey

Hi @kevinduffey,

Thank for your reaction.

that means every tool out there will need to support BOTH or they could run in to situations where a document uses one that they dont support

Yes, that's the case. Arazzo already requires implementations to evaluate JSONPath expressions during Criterion Object evaluation. And requires to evaluate JSON Pointer when plucking parts of step outputs (among others). So we're already in that situation - every compliant implementation needs to support both.

In JSONPath we have Normalized Paths which fully replaces JSON Pointer capabilities as it can produce a single node result. On top of that we have all the capabilities of non-singular queries inside JSONPath.

Replacing the support for JSON Pointer by JSONPath might be the easiest thing in next major release as it extends the capabilities significantly while still retaining all the prior JSON Pointer capabilities.

char0n avatar Sep 05 '25 19:09 char0n

Fair points. I did forget we have some JSON Path stuff in there. I honestly forget why we have both now, so again refer to the might @frankkilcommins as he is the foremost expert in all things Arazzo. :).

kevinduffey avatar Sep 05 '25 19:09 kevinduffey

NOTE: The proposal below has been superseded. Please immediately refer to the latest proposal, and come back here if interested in the media-type-dependent support which for now has been deferred.


Having discussed this with @char0n in slack, this proposal is a starting point for how we could include additional structured selectors (e.g. JSONPath and XPath) within Arazzo's Runtime Expressions. The goal would be to include support as part of v1.1.0 release and to honour backwards compatibility.

Proposal for Selector Evaluation Extensions within Arazzo

This proposal aims to introduce an extended syntax and optional grouping model to support evaluation of structured selectors (e.g., JSONPath, XPath) following a valid Arazzo expression. It maintains backward compatibility with existing expression rules and clarifies how selector semantics are applied using known standards. The ambition is to also give authors a means to specify expressions which cater for structured data values which are influenced by content negotiation (e.g. could either be JSON or XML).

Selector Boundary Semantics

Arazzo expressions may resolve to structured data values (such as JSON or XML documents). When this occurs, a selector MAY be applied to navigate within the resolved structure. The boundary between Arazzo expression and selector expression is indicated by the # character. All content preceding # MUST conform to the Arazzo ABNF grammar. The content after # is interpreted as a selector expression using an external selector specification.

Extended Selector Syntax

Arazzo SHALL support the following extended selector syntax format: <arazzo_expression>#<selector_syntax>[@<version>]:<selector_expression>

Where:

  • arazzo_expression is a valid Arazzo runtime expression (e.g., $steps.findPets.outputs.pets)
  • selector_syntax is one of the supported selector types:
  • jsonpath
  • xpath
  • jsonpointer (included for completeness)
  • @<version> is an optional version identifier
  • : separates selector metadata from the actual selector expression

If @<version> is omitted, the following defaults SHALL apply:

Selector Syntax Default Version
jsonpath rfc9535
xpath 3.1
jsonpointer rfc6901

Multiple Selector Expressions

To support media-type–dependent selector behaviour (e.g., content negotiation or mixed-structure responses), Arazzo SHALL support expressions to be set per response payload media-type or media-type range.

Example of current support:

outputs:
    tokenExpires: $response.header.X-Expires-After
    rateLimit: $response.header.X-Rate-Limit
    sessionToken: $response.body#/pointer

By extending how the Output Object is defined, we can extend the ability for media-type awareness in a backwards compatible manner. This would involve enhancing the outputs field type will change from Map[string, {expression}] to Map[string, {expression} | Output Object] where Output Object is basically defined as mapping of response media type / range to runtime expression.

It would be recommended to rename Criterion Expression Type Object to Expression Type Object, and extend clarity on the full default behaviours if omitted.

That would allow and author to define outputs with media-type awareness as follows:

outputs:
    tokenExpires: $response.header.X-Expires-After
    rateLimit: $response.header.X-Rate-Limit
    sessionToken: 
       application/json: 
           expression: $response.body#/pointer
           type: [string | Expression Type Object]
       application/xml: 
           expression: $response.body#/pointer
           type: [string | Expression Type Object]

Compatibility Notes

This proposal recomments that for v1.1.0 the Arazzo Specification should state that implementations MUST support jsonpointer@rfc6901 , jsonpath@rfc9535 and [email protected] and MAY support other versions if they choose to guarantee backwards compatibility.

Feature Status
# as JSON Pointer Fully supported (legacy behavior)
#<selector>:... Newly supported
@version Newly support and Optional
No selector after # Treated as JSON Pointer (e.g., $response.body#/id)

Legacy expressions with JSON Pointer (e.g., $response.body#/id) continue to be valid. Tools MAY infer selector syntax by inspecting the leading character:

  • /: JSON Pointer
  • $: JSONPath
  • /... (in XML context) : XPath (optional fallback)

Parser Guidance

Tooling MUST:

  • Parse expressions before # as per existing ABNF
  • Parse selector metadata (syntax, version) from the #...: segment
  • Delegate evaluation of selector expressions to an appropriate engine based on:
    • Declared syntax (jsonpath, xpath)
    • Version (if specified)
    • Resolved content-type (if needed via the new Output Object)

Tooling MAY:

  • Reject unsupported selector syntaxes

frankkilcommins avatar Oct 16 '25 16:10 frankkilcommins

Criterion Expression Type Object

Ouch, I just noticed that the Arazzo spec has it for the first time. I wander if this should be dropped altogether. There is no chance tooling would support it as it means multiple versions of engines had to be installed. What was the reason to add it?

This would simplify the syntax by not having @<version> thing.

To support media-type–dependent selector behaviour

What is the use-case for that? It adds a lot of extra complexity with no clear reason. I understand that it improves reusability but complexity price is not worth it in my opinion.

RomanHotsiy avatar Oct 22 '25 12:10 RomanHotsiy

Hi @RomanHotsiy,

Yes, I'm of the same opinion, but it's already there now and if we want a backward compatible change, we'll have to go for a clarification using [BCP 14][RFC2119][RFC8174] to clarify that implementation MUST implement jsonpath@rfc9535, [email protected] and jsonpointer@rfc6901, but it MAY implement other (arbitrary?/enumerated?) versions. This would get us where we want to be IMHO. And then we can possibly drop in next major backward incompatible release.

More in: https://github.com/OAI/Arazzo-Specification/pull/374#issuecomment-3414525168.

char0n avatar Oct 22 '25 13:10 char0n

Hey @RomanHotsiy,

To support media-type–dependent selector behaviour

What is the use-case for that? It adds a lot of extra complexity with no clear reason. I understand that it improves reusability but complexity price is not worth it in my opinion.

Many APIs support multiple response Content-Types, and clients can use content negotiation (typically via the Accept header) to request a specific response format.

For Arazzo to be compatible with APIs that follow standard HTTP content negotiation, it should provide a mechanism to express expressions that are valid for a particular Content-Type either based on the format selected by the client or returned by the server.

frankkilcommins avatar Oct 22 '25 14:10 frankkilcommins

For Arazzo to be compatible with APIs that follow standard HTTP content negotiation, it should provide a mechanism to express expressions that are valid for a particular Content-Type either based on the format selected by the client or returned by the server.

Can server return random content-type or am I missing something? If we describe a step with Accept: application/json the server is supposed to return the data in json format and if it doesn't there is no point to continue execution.

RomanHotsiy avatar Oct 22 '25 14:10 RomanHotsiy

Can server return random content-type or am I missing something? If we describe a step with Accept: application/json the server is supposed to return the data in json format and if it doesn't there is no point to continue execution.

Totally fair. If the step sets Accept: application/json, the server is expected to comply, and it’s valid to fail otherwise.

But the point of media-type–dependent expressions is about design-time flexibility and letting a single workflow support both application/json and application/xml responses when APIs support content negotiation. That way, the workflow can adapt based on what the client prefers or what the server provides, without duplicating logic or baking in a single format. It's something that was slightly overlooking in the current verion.

frankkilcommins avatar Oct 22 '25 15:10 frankkilcommins

But the point of media-type–dependent expressions is about design-time flexibility and letting a single workflow support both application/json and application/xml responses when APIs support content negotiation. That way, the workflow can adapt based on what the client prefers or what the server provides, without duplicating logic or baking in a single format. It's something that was slightly overlooking in the current verion.

Yes, so as I mentioned about I agree that it "improves reusability". But is it really such a common use case/pain point? I would suggest evolving the spec based on the real use cases and real user feedback or the spec risks to become too complex for any tooling to support all the features.

RomanHotsiy avatar Oct 22 '25 15:10 RomanHotsiy

XML remains common in certain industries, and with AI engaging with APIs across industries, this becomes increasingly important for technologies like Arazzo.

JSON-oriented products might not see XML in their most common use cases, but that doesn't mean that it's not used or that an XML-based API won't show up in a workflow.

handrews avatar Oct 22 '25 17:10 handrews

@RomanHotsiy I do agree that we should design based on use cases, but the weird partitioning of the JSON and XML spaces makes each somewhat blind to the other's use cases. We've seen some excitement around the improved XML support in OAS v3.2. Certainly not as much as some other features! But there are people who have been asking for better XML support for many years and are happy we finally listened.

There are real XML use cases out there, and in informal conversations with AI folks at a recent gathering, it came up that AI is having to deal with XML APIs that are entrenched, and this is particularly relevant to worflow description where you might need to talk to APIs with different representation formats.


Regarding real use cases, I'll also point to https://github.com/OAI/OpenAPI-Specification/issues/2146 requesting a selector syntax for multipart responses. I have been pleasantly surprised at the excitement around expanded multipart support in 3.2. multipart (_other than multipart/form-data) turns out to be used a lot, for example to combine a JSON metadata blob with a binary payload, and that usage does not seem to be declining. If we're considering selector options, something simple to indicate which named or numbered part (prior to applying the selector to the part based on the part's media type) would be worth considering. Even if it is not implemented in 1.1. It would be good to allow for future expansion.

A similar and extremely important use case is selectors for streaming payloads, which seem to be ubiquitous in the API payload world. We had someone show up with a multipart/mixed + application/json implementation of streaming JSON, plus the more common application/jsonl, application/json-seq, etc. For an example outside of AI, geospatial systems make heavy use of application/geo+json-seq (combination of application/json-seq and application/geo+json).


That said, I can't quite tell (still on 1st cup of tea) whether this proposal restricts selector syntax to appropriate media types. Tools should not be expected to figure out how to apply XML selectors to JSON or vice versa. Only a selector syntax designed to work with both should be required to be supported for both.

handrews avatar Oct 22 '25 17:10 handrews

JSON-oriented products might not see XML in their most common use cases, but that doesn't mean that it's not used or that an XML-based API won't show up in a workflow.

I fully agree and I never said XML is not important. What I'm saying is that XML is already supported by Arazzo without adding any content-aware negotiation logic. The worst case is someone would need to duplicate a workflow but I believe there are very very few use cases like this so we optimize the spec to cover 0.5% of use cases and making life of 100% of author tooling harder. I may be wrong but I haven't seen any evidence.

Regarding real use cases, I'll also point to https://github.com/OAI/OpenAPI-Specification/issues/2146 requesting a selector syntax for multipart responses. I have been pleasantly surprised at the excitement around expanded multipart support in 3.2.

I think there is a misunderstanding. I am not opposing any new selector syntax. I think it makes sense. What I think we should not do just yet is media-type-aware outputs:

That would allow and author to define outputs with media-type awareness as follows

RomanHotsiy avatar Oct 23 '25 00:10 RomanHotsiy

@RomanHotsiy thanks for taking the time to elaborate! I'll spend a little more time with this proposal before replying further.

handrews avatar Oct 23 '25 04:10 handrews

@RomanHotsiy we discussed the feedback in this week's Arazzo call, and the originally proposed support for media-type–dependent selector behaviour (e.g., selecting expressions based on negotiated Content-Type) will be deferred. Authors who wish to support different response formats (e.g., JSON vs. XML) should express this using separate workflows, each configured with a specific Accept header and content-type expectation.

I will provide a new version of the proposal and then move onto preparing a PR.

frankkilcommins avatar Nov 14 '25 14:11 frankkilcommins

Proposal for Selector Evaluation Extensions within Arazzo

This proposal introduces an extended syntax to support evaluation of structured selectors (e.g., JSONPath, XPath) within valid Arazzo expressions. The goal is to improve expressiveness and enable precise traversal of structured data values such as JSON and XML, while remaining fully backward-compatible with Arazzo 1.0.x.

Selector Boundary Semantics

Arazzo expressions may resolve to structured data values such as JSON or XML. When this occurs, a selector MAY be applied to further navigate the structure. The boundary between the Arazzo expression and the external selector is indicated by the # character.

  • The portion before # MUST conform to the Arazzo ABNF expression syntax.
  • The portion after # is interpreted as a selector expression, using a known external standard.

Extended Selector Syntax

Arazzo SHALL support the following extended selector syntax format: <arazzo_expression>#<selector_syntax>[@<version>]:<selector_expression>

Selector Segment Description
arazzo_expression Any valid Arazzo runtime expression (e.g., $response.body)
selector_syntax The selector type: jsonpath, xpath, or jsonpointer
@<version> (optional) Version identifier of the selector specification
: Separator between selector metadata and the actual selector expression
selector_expression The structured selector (e.g., $.foo[0], /Invoice/Amount)

Examples

# JSONPath using RFC 9535 (default)
$response.body#jsonpath:$.items[0].price

# XPath using version 3.1 (default)
$response.body#xpath:/Envelope/Body/Item[1]/Total

# JSONPath using explicit version
$response.body#jsonpath@draft-goessner-dispatch-jsonpath-00:$.legacyField

# JSON Pointer (default)
$response.body#/order/id

Selector Version Support

To ensure portability and consistency, Arazzo defines allowed selector versions as follows:

Selector Syntax Default Version Allowed Versions
jsonpath rfc9535 rfc9535, raft-goessner-dispatch-jsonpath-00
xpath 3.1 3.1, 3.0, 2.0, 1.0
jsonpointer rfc6901 rfc6901

Compatibility Notes

This proposal recomments that for v1.1.0 the Arazzo Specification should state that implementations MUST support jsonpointer@rfc6901 , jsonpath@rfc9535 and [email protected], and MAY support other versions if they choose to guarantee backwards compatibility.

Feature Status
# as JSON Pointer Fully supported (legacy behavior)
#<selector>:... Newly supported
@version Newly support and Optional
No selector after # Treated as JSON Pointer (e.g., $response.body#/id)

Legacy expressions with JSON Pointer (e.g., $response.body#/id) continue to be valid. Tooling MAY infer selector syntax by inspecting the leading character:

  • /: JSON Pointer
  • $: JSONPath
  • /... (in XML context) : XPath (optional fallback)

Tooling Guidance

Implementations of Arazzo 1.1.0 MUST support selectors conforming to:

  • JSONPath as per RFC 9535
  • XPath 3.1 per W3C XPath 3.1
  • JSON Pointer per RFC 6901

Tooling MUST:

  • Parse any content before # using the Arazzo ABNF expression grammar
  • Parse and validate selector metadata (syntax and version) from #<syntax>[@version]:...
  • Route selector expressions to an appropriate evaluation engine based on declared syntax and version

Tooling MAY:

  • Support additional (non-normative) versions for internal or compatibility purposes
  • Reject unknown selector types or unsupported versions
  • Defer evaluation errors (e.g., invalid selector for the actual resolved value) to runtime

frankkilcommins avatar Nov 14 '25 14:11 frankkilcommins

@frankkilcommins Could you please clarify what would be the benefit of using this new syntax over the previous variant?

Suggested:

$response.body#xpath:/Envelope/Body/Item[1]/Total

Current:

- context: $response.body
  condition: /Envelope/Body/Item[1]/Total
  type: xpath

Do you have any example that can't be described with the current version? At first glance it is harder to read comparing to existing syntax. Also having two ways to describe the same thing would be confusing.

DmitryAnansky avatar Nov 17 '25 13:11 DmitryAnansky

Hi @DmitryAnansky

Thanks for raising this.

The initial intention behind the proposal was to enhance areas where runtime expressions as value selectors/setters (not condition checks), such as in:

  • outputs
  • parameters
  • requestBody

I do however agree that having similar but different methods across the different flavours is not ideal. And I'm probably leaning towards my own original thoughts on this which is that we can have a similar shared model for both contexts and that would also be easier to support and evolve into the future.

To do this, we could have:

Expression Type Object

This replaces the previously named Criterion Expression Type Object and will be reused for both:

  • Criterion Object
  • Selector Object

Allowed type/version values remain:

Type Allowed Versions Default
jsonpath rfc9535, draft-goessner-dispatch-jsonpath-00 rfc9535
xpath 3.1, 3.0, 2.0, 1.0 3.1
jsonpointer rfc6901 Just adding for completeness rfc6901

Selector Object

Introduce a Selector Object, that can be used wherever more complex expressions than what's covered by the Arazzo ABNF grammer against structured data are required.

expression: string   # A valid Arazzo expression (e.g., $response.body)
selector: string     # A selector expression (e.g., $.items[0].id, /Envelope/Item)
type: string | Expression Type Object

Examples of usage:

outputs:
  userEmail:
    expression: $response.body
    selector: $.user.profile.email
    type: jsonpath

The example above would be equvilant the following inline expression (if we took the earlier proposal): userEmail: $response.body#jsonpath:$.user.profile.email

RequestBody example:

requestBody:
  contentType: application/json
  payload:
    invoiceId:
      expression: $steps.fetchXml.outputs.invoiceXml
      selector: /Invoice/Header/InvoiceNumber
      type:
        type: xpath
        version: 3.1

The example above would be equvilant the following inline expression (if we took the earlier proposal):

requestBody:
  contentType: application/xml
  payload: |
    <invoice>
      <id>{$steps.fetchXml.outputs.invoiceXml#xpath:/Invoice/Header/InvoiceNumber}</id>
    </invoice>

frankkilcommins avatar Nov 17 '25 14:11 frankkilcommins