parse_sql vs parse_plpgsql usage
Hello! Nice projection! Trying to determine if this library could be utilized to parse many SQL statements to form an AST of a schema - or at least have the per-statement AST parsed in useful ways to formulate things about the schema. I am uncertain of the difference between the parse_sql and parse_plpgsql functions and how to utilize them correctly. An example...
CREATE OR REPLACE FUNCTION test.test_parse (
p_time_start timestamptz,
p_time_end timestamptz,
p_time_interval interval default NULL
) RETURNS TABLE (
ts timestamptz,
arbitrary_return bigint
) AS $$
BEGIN
-- some comment
-- some other comment
IF p_time_interval IS NULL
THEN p_time_interval := interval_from_start_end(p_time_start, p_time_end);
END IF;
RETURN QUERY
SELECT
bucket_function(p_time_interval, timestamp) AS ts,
arbitrary_return
FROM test.some_table
WHERE
start >= p_time_start
AND end < p_time_end
GROUP BY 1;
END; $$ LANGUAGE plpgsql SECURITY DEFINER PARALLEL UNSAFE;
Using parse_plpgsql fails:
>>> parse_plpgsql(raw_sql)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/x/.venv/lib/python3.9/site-packages/pglast/__init__.py", line 31, in parse_plpgsql
return loads(parse_plpgsql_json(statement))
File "pglast/parser.pyx", line 346, in pglast.parser.parse_plpgsql_json
pglast.parser.ParseError: "p_time_interval" is not a known variable
Using parse_sql here works, but yields a big blob unparsed:
>>> stmt_ast = parse_sql(raw_sql)
>>> stmt_ast[0].stmt.options[0].arg[0].val
'\nBEGIN\n -- some comment\n -- some other comment\n\n IF p_time_interval IS NULL\n THEN p_time_interval := interval_from_start_end(p_time_start, p_time_end);\n END IF;\n RETURN QUERY\n SELECT\n bucket_function(p_time_interval, timestamp) AS ts,\n arbitrary_return\n FROM test.some_table\n WHERE\n start >= p_time_start\n AND end < p_time_end\n GROUP BY 1;\nEND; '
Trying to parse that inner blob as plpgsql similarly does not work:
>>> parse_plpgsql(stmt_ast[0].stmt.options[0].arg[0].val)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/x/.venv/lib/python3.9/site-packages/pglast/__init__.py", line 31, in parse_plpgsql
return loads(parse_plpgsql_json(statement))
File "pglast/parser.pyx", line 346, in pglast.parser.parse_plpgsql_json
pglast.parser.ParseError: syntax error at or near "IF", at index 58
Any tips?
Further - any heuristic to not need to explicitly declare "this function should be parsed as SQL" versus "this function should be parsed as PL/pgSQL", or do I need to decide that per statement?
Hi, unfortunately the plpgsql parser is not yet fully implemented, so I'm afraid it is of little utility as of now. I'm enjoying a short vacation so I cannot try your example: I will do once I get back.
As the related issue is now closed, I will try to get at this soon.
The parse error is now fixed in just released v3.6, but I'm leaving this open to remind me a possible clarification in the doc about other issues:
- difference between the two main parse functions
- make it more evident that the support for
PLpgSQLis still vaporware