dsq icon indicating copy to clipboard operation
dsq copied to clipboard

Ingest limited columns in basic SQL queries

Open eatonphil opened this issue 3 years ago • 4 comments

There may not be an existing SQLite parser we can use from Go but for simple queries we can use a PostgreSQL parser, see here for a good one and example of use.

The way this would work is that it would attempt to parse the query. If it can parse the query and the query consists of only syntax that we support, return all fields in the query. Then we pass this list of fields to the SQLiteWriter. If this list is set in the SQLiteWriter then when we write fields to SQLite we only write the ones in this list.

For a first pass I'd suggest supporting:

  • SELECT x FROM {} WHERE y = 1 where this returns ['x', 'y']

Additional ones that won't be too bad:

  • SELECT COUNT(x) FROM {} WHERE y = 2 returns ['x', 'y']
  • SELECT x FROM {} GROUP BY z returns ['x', 'z']

Harder but reasonable examples:

  • SELECT a.x FROM {0} a JOIN {1} b ON a.id = b.json_id returns {'a': ['x', 'id'], 'b': ['json_id']}

Examples this must fail on (this is not a comprehensive list):

  • SELECT x, * FROM {} (because of the star operator
  • SELECT x FROM {0} JOIN {1} ON id=json_id (ambiguous where x, id, and json_id come from; also requires supporting different columns for different tables)

eatonphil avatar Jun 20 '22 16:06 eatonphil

This could also be extended to support LIMIT x without an ORDER BY clause to have it ingest only x rows.

eatonphil avatar Jun 20 '22 16:06 eatonphil

Also, this mode must be disabled when -C/--cache is on.

eatonphil avatar Jun 20 '22 16:06 eatonphil

I wouldn't mind have a go at this - I've been doing some initial investigations with the pg_query_go library and I think I can get something together to cover some of the above cases.

mc-borscht avatar Sep 17 '22 10:09 mc-borscht

Hey @mc-borscht there's a PR open for this https://github.com/multiprocessio/dsq/pull/76 but I got stuck because pg_query_go doesn't build on windows.

If you want, you can pick up that PR and get it working. Although before merging it I wanted to have some benchmarks that show it's actually an improvement.

To deal with pg_query_go not building on windows we could either fix pg_query_go's build process or we could use compile flags in Go to make this feature ignored on Windows.

eatonphil avatar Sep 18 '22 00:09 eatonphil