Ingest limited columns in basic SQL queries
There may not be an existing SQLite parser we can use from Go but for simple queries we can use a PostgreSQL parser, see here for a good one and example of use.
The way this would work is that it would attempt to parse the query. If it can parse the query and the query consists of only syntax that we support, return all fields in the query. Then we pass this list of fields to the SQLiteWriter. If this list is set in the SQLiteWriter then when we write fields to SQLite we only write the ones in this list.
For a first pass I'd suggest supporting:
-
SELECT x FROM {} WHERE y = 1where this returns['x', 'y']
Additional ones that won't be too bad:
-
SELECT COUNT(x) FROM {} WHERE y = 2returns['x', 'y'] -
SELECT x FROM {} GROUP BY zreturns['x', 'z']
Harder but reasonable examples:
-
SELECT a.x FROM {0} a JOIN {1} b ON a.id = b.json_idreturns{'a': ['x', 'id'], 'b': ['json_id']}
Examples this must fail on (this is not a comprehensive list):
-
SELECT x, * FROM {}(because of the star operator -
SELECT x FROM {0} JOIN {1} ON id=json_id(ambiguous where x, id, and json_id come from; also requires supporting different columns for different tables)
This could also be extended to support LIMIT x without an ORDER BY clause to have it ingest only x rows.
Also, this mode must be disabled when -C/--cache is on.
I wouldn't mind have a go at this - I've been doing some initial investigations with the pg_query_go library and I think I can get something together to cover some of the above cases.
Hey @mc-borscht there's a PR open for this https://github.com/multiprocessio/dsq/pull/76 but I got stuck because pg_query_go doesn't build on windows.
If you want, you can pick up that PR and get it working. Although before merging it I wanted to have some benchmarks that show it's actually an improvement.
To deal with pg_query_go not building on windows we could either fix pg_query_go's build process or we could use compile flags in Go to make this feature ignored on Windows.