Auto-detect all input formats for files
zed doesn't try to detect CSV, Parquet, JSON, or ZST inputs, but for file inputs, it should.
It doesn't because zio/detector.NewReaderWithOpts wraps its io.Reader parameter with Track and Recorder, which don't implement io.ReadSeeker and so aren't compatible with zio/parquetio.NewReader or zio/zstio.NewReader. But if the io.Reader parameter does implement io.ReadSeeker, NewReaderWithOpts can try Parquet, JSON, and ZST first, using Seek to rewind for the next format after each try.
Note to self: For now I've added a comment in the "Custom Brimcap Configuration" article linking to this as an open issue, so that way the reader understands the current limitation is temporary. If/when we address this issue, I should update the wiki article to remove the workaround and the comment. The same is true for the "Importing CSV, JSON, Parquet, and ZST (v0.25.0+)" article in the Brim wiki.
I'm pleased to report that JSON auto-detect support was added via #3124.
Another update: CSV auto-detect support was added via #3277.
Verified in Zed commit e569877.
Here's an example of reading each of the supported formats (with the exception of line) using auto-detect.
$ zq -version
Version: v1.3.0-42-ge5698777
$ for format in arrows zng vng json zeek zjson csv parquet zson; do echo '{"hello": "world", "pi": 3.14}' | zq -f $format -o sample.$format - ; done
$ for file in *; do echo -n "$file: "; zq 'count()' $file; done
sample.arrows: {count:1(uint64)}
sample.csv: {count:1(uint64)}
sample.json: {count:1(uint64)}
sample.parquet: {count:1(uint64)}
sample.vng: {count:1(uint64)}
sample.zeek: {count:1(uint64)}
sample.zjson: {count:1(uint64)}
sample.zng: {count:1(uint64)}
sample.zson: {count:1(uint64)}
line format still requires an explicit -i, since there's no way to automatically determine intent to treat input this way.
$ zq -i line sample.zson
"{"
" hello: \"world\","
" pi: 3.14"
"}"
$ zq -i line sample.csv
"hello,pi"
"world,3.14"
And, as mentioned in #4270, the non-seekable formats like Parquet and VNG aren't readable if compressed, whether via auto-detect or expliit.
$ gzip *
$ for file in *; do echo -n "$file: "; zq 'count()' $file; done
sample.arrows.gz: {count:1(uint64)}
sample.csv.gz: {count:1(uint64)}
sample.json.gz: {count:1(uint64)}
sample.parquet.gz: sample.parquet.gz: format detection error
arrows: schema message length exceeds 1 MiB
zeek: line 1: bad types/fields definition in zeek header
zjson: line 1: invalid character 'P' looking for beginning of value
zson: ZSON syntax error
zng: malformed zng record
csv: record on line 2: wrong number of fields
json: invalid character 'P' looking for beginning of value
parquet: auto-detection requires seekable input
vng: auto-detection requires seekable input
line: auto-detection not supported
sample.vng.gz: sample.vng.gz: format detection error
arrows: schema message length exceeds 1 MiB
zeek: line 1: bad types/fields definition in zeek header
zjson: line 1: invalid character '\x06' looking for beginning of value
zson: ZSON syntax error
zng: truncated input
csv: line 1: no comma found
json: invalid character '\x06' looking for beginning of value
parquet: auto-detection requires seekable input
vng: auto-detection requires seekable input
line: auto-detection not supported
sample.zeek.gz: {count:1(uint64)}
sample.zjson.gz: {count:1(uint64)}
sample.zng.gz: {count:1(uint64)}
sample.zson.gz: {count:1(uint64)}
$ zq -i parquet sample.parquet.gz
sample.parquet.gz: reader cannot seek
$ zq -i vng sample.vng.gz
sample.vng.gz: VNG must be used with a seekable input
Thanks @nwt!