gitbase icon indicating copy to clipboard operation
gitbase copied to clipboard

Parallelize expensive projects that are not under an exchange

Open erizocosmico opened this issue 6 years ago • 2 comments

SELECT uast_extract(
    uast(blob_content, 'csharp', "(//csharp:BinaryExpression_AddExpression/Left/uast:String | //csharp:InterpolatedStringExpression//csharp:InterpolatedStringTextToken[1])[starts-with(normalize-space(@Value), 'SELECT') or starts-with(normalize-space(@Value), 'select') or starts-with(normalize-space(@Value), 'UPDATE') or starts-with(normalize-space(@Value), 'update') or starts-with(normalize-space(@Value), 'DELETE') or starts-with(normalize-space(@Value), 'delete') or starts-with(normalize-space(@Value), 'INSERT') or starts-with(normalize-space(@Value), 'insert') or starts-with(normalize-space(@Value), 'CREATE') or starts-with(normalize-space(@Value), 'create') or starts-with(normalize-space(@Value), 'ALTER') or starts-with(normalize-space(@Value), 'alter') or starts-with(normalize-space(@Value), 'DROP') or starts-with(normalize-space(@Value), 'drop')]"),
    '@pos') AS positions,
    repository_id,
    file_path
FROM (
    SELECT f.repository_id,
        f.file_path,
        b.blob_content
    FROM (
        SELECT *
        FROM refs r
        NATURAL JOIN commit_blobs cb
        NATURAL JOIN blobs
        WHERE r.ref_name = 'HEAD'
            AND NOT IS_BINARY(blob_content)
    ) b
    INNER JOIN (
        SELECT repository_id, file_path, blob_hash
        FROM refs r
        NATURAL JOIN commit_files cf
        WHERE r.ref_name = 'HEAD'
    ) f
    ON b.blob_hash = f.blob_hash
        AND b.repository_id = f.repository_id
    WHERE language(f.file_path, b.blob_content) = 'C#'
) t
WHERE positions IS NOT NULL

This could be parallelized adding an exchange over the topmost projection. Instead, we do this serially, causing extremely low performance on queries using uast functions on the topmost projects under certain conditions.

erizocosmico avatar Feb 22 '19 13:02 erizocosmico

This cannot be done right now (see https://github.com/src-d/go-mysql-server/pull/621).

What we can do (and would solve the problem for this query) is execute Projections in parallel when they contain UDFs if there is not an exchange higher up the tree.

WDYT @ajnavarro?

erizocosmico avatar Mar 19 '19 15:03 erizocosmico

This is actually slower than serially computing project, even with UDFs:

goos: darwin
goarch: amd64
pkg: gopkg.in/src-d/go-mysql-server.v0/sql/plan
BenchmarkProject/no_parallelism-4         	   10000	    168082 ns/op	   67199 B/op	     958 allocs/op
BenchmarkProject/parallelism-4            	    5000	    275574 ns/op	   67821 B/op	     967 allocs/op
PASS
ok  	gopkg.in/src-d/go-mysql-server.v0/sql/plan	3.131s

I think the only case that might benefit from this is actually bblfsh parsing. So if we enable this, it would need to be a gitbase rule, not a go-mysql-server one, because for builtin UDFs it's twice as slow.

erizocosmico avatar Mar 27 '19 16:03 erizocosmico