zq JSON output doesn't pass 'parse'
when using zq to create multi-line/record json files (-f json or -j) the resulting output won't parse as JSON without error..
this may have been an unintended side effect of collapsing ndjson and json output types into one??
the simplest way to replicate and see parse error is to open resulting json file in firefox where you'll get: "SyntaxError: JSON.parse: unexpected non-whitespace character after JSON data at line 2 column 1 of the JSON data"
example output data which doesn't parse:
{"Date":"1994-12-31T00:00:00Z","AWE":630.1,"AAE":32765.2,"S":38000}
{"Date":"1995-06-30T00:00:00Z","AWE":648.2,"AAE":33706.4,"S":40000}
{"Date":"1995-12-31T00:00:00Z","AWE":661.2,"AAE":34382.4,"S":40000}
browser expecting, at minimum (array boundaries and separators):
[{"Date":"1994-12-31T00:00:00Z","AWE":630.1,"AAE":32765.2,"S":38000},
{"Date":"1995-06-30T00:00:00Z","AWE":648.2,"AAE":33706.4,"S":40000},
{"Date":"1995-12-31T00:00:00Z","AWE":661.2,"AAE":34382.4,"S":40000}]
I think our logic here was that the json output format would format each value as JSON, which for multi-value results would be line-separated (and thus compatible with NDJSON readers) and if you wanted to create a single-value JSON result from a multi-valued query, you could always do that with your Zed query, e.g.,
... | collect(this) | yield collect
If this is too inconvenient, we could add back a command-line flag to do this automatically or we could add a more convenient short-hand for this operator sequence.
On a related note, this pattern of running an aggregate function into a single-field record followed by extracting the value from the record is common enough that we've talked about a shortcut for this, e.g., reduce collect(this).
@nwt any thoughts?
Great thanks!
I can absolutely see that, having read the docs some more, as it's the only way to do it really, from the standpoint of serialization..
| collect(this) | yield collect is easy and elegant
So I think it's just a little tension between serialization and simpler standalone zq command line usage (for the casual user, of which I am), e.g. you're using zq to transform some CSV 'records' to JSON 'records' and the next tool in your chain is expecting parsable json (browser, other command line 'json tools', json readers in python, JS or whatever). It's then as a casual user like me, you're going to use -f json expecting 'valid' json output and be surprised to get ndjson (or more precisely, be surprised when your next tool coughs up an 'invalid json' error.)
Reading through output from 'zq -h', as a 'newbie', I'd be inclined to put ndjson back as a type under -f, have -f json output valid json for any input, which still leaves you -j (assuming it equals -f ndjson) as the shorthand to use when scripting serialization tasks.. I don't think having ndjson as the default 'throughout the stack' is a problem as long as it's explicit.
But I'm not able to 'see' the bigger picture for your 'ecosystem' because I simply don't know zed/zq well enough, so this may break something fundamental, so I'll have to leave it to you folks to decide.
Personally I'm happy using collect(this) method, and to have this issue closed if you folks deem no action is required..
Since the original community user seems satisfied for now, what I plan to do is add a "tip" to the "But JSON" section of the zq tutorial to explain that the JSON output is actually NDJSON-compatible in a way that lends itself to working with tools like jq but could trip up tools that expect an array of JSON values. It can point out the | collect(this) | yield collect approach for the users that need it. If this comes up a lot, we may also add some kind of shorthand to achieve this as well.
The docs at https://zed.brimdata.io/docs/next/tutorials/zq now include some additional text on this topic.
