Parallelize expensive projects that are not under an exchange
SELECT uast_extract(
uast(blob_content, 'csharp', "(//csharp:BinaryExpression_AddExpression/Left/uast:String | //csharp:InterpolatedStringExpression//csharp:InterpolatedStringTextToken[1])[starts-with(normalize-space(@Value), 'SELECT') or starts-with(normalize-space(@Value), 'select') or starts-with(normalize-space(@Value), 'UPDATE') or starts-with(normalize-space(@Value), 'update') or starts-with(normalize-space(@Value), 'DELETE') or starts-with(normalize-space(@Value), 'delete') or starts-with(normalize-space(@Value), 'INSERT') or starts-with(normalize-space(@Value), 'insert') or starts-with(normalize-space(@Value), 'CREATE') or starts-with(normalize-space(@Value), 'create') or starts-with(normalize-space(@Value), 'ALTER') or starts-with(normalize-space(@Value), 'alter') or starts-with(normalize-space(@Value), 'DROP') or starts-with(normalize-space(@Value), 'drop')]"),
'@pos') AS positions,
repository_id,
file_path
FROM (
SELECT f.repository_id,
f.file_path,
b.blob_content
FROM (
SELECT *
FROM refs r
NATURAL JOIN commit_blobs cb
NATURAL JOIN blobs
WHERE r.ref_name = 'HEAD'
AND NOT IS_BINARY(blob_content)
) b
INNER JOIN (
SELECT repository_id, file_path, blob_hash
FROM refs r
NATURAL JOIN commit_files cf
WHERE r.ref_name = 'HEAD'
) f
ON b.blob_hash = f.blob_hash
AND b.repository_id = f.repository_id
WHERE language(f.file_path, b.blob_content) = 'C#'
) t
WHERE positions IS NOT NULL
This could be parallelized adding an exchange over the topmost projection. Instead, we do this serially, causing extremely low performance on queries using uast functions on the topmost projects under certain conditions.
This cannot be done right now (see https://github.com/src-d/go-mysql-server/pull/621).
What we can do (and would solve the problem for this query) is execute Projections in parallel when they contain UDFs if there is not an exchange higher up the tree.
WDYT @ajnavarro?
This is actually slower than serially computing project, even with UDFs:
goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-mysql-server.v0/sql/plan
BenchmarkProject/no_parallelism-4 10000 168082 ns/op 67199 B/op 958 allocs/op
BenchmarkProject/parallelism-4 5000 275574 ns/op 67821 B/op 967 allocs/op
PASS
ok gopkg.in/src-d/go-mysql-server.v0/sql/plan 3.131s
I think the only case that might benefit from this is actually bblfsh parsing. So if we enable this, it would need to be a gitbase rule, not a go-mysql-server one, because for builtin UDFs it's twice as slow.