json-stream icon indicating copy to clipboard operation
json-stream copied to clipboard

RFC7464 - Multiple JSON docs with record separators

Open daggaz opened this issue 5 months ago • 2 comments

There is an RFC https://datatracker.ietf.org/doc/html/rfc7464 for JSON docs delimited by a record separator character. Do we care to support this format with load_many()/visit_many()?

daggaz avatar Nov 02 '25 18:11 daggaz

@smheidrich this would require support in the rust tokenizer too. I'm honestly not sure that this RFC is actually used by anyone...

daggaz avatar Nov 02 '25 18:11 daggaz

@daggaz So since 0x0A ("\x0a") is just "\n" which is already always discarded when found as a "top-level" token, would this involve changing the tokenizers so they discard 0x1E ("\x1e") as well or would that constitute "overly-tolerant" parsing similar to some of the issues from #55?

AFAICT the JSON spec does not consider 0x1E insignificant whitespace like it does 0x0A. So for full compliance with the JSON spec, I guess tokenizers would have to bubble this up to json-stream (probably by yielding it as a new operator, i.e. (TokenType.OPERATOR, "\x1e")) and let it decide what to do with it, wouldn't they? 🤔

Or should they get a constructor parameter that tells them whether the document being parsed should be considered a RFC 7464 document or not, so they can either raise an exception or yield it as an operator depending on that?

smheidrich avatar Nov 02 '25 20:11 smheidrich