pglast icon indicating copy to clipboard operation
pglast copied to clipboard

parse_sql vs parse_plpgsql usage

Open remingtonc opened this issue 4 years ago • 3 comments

Hello! Nice projection! Trying to determine if this library could be utilized to parse many SQL statements to form an AST of a schema - or at least have the per-statement AST parsed in useful ways to formulate things about the schema. I am uncertain of the difference between the parse_sql and parse_plpgsql functions and how to utilize them correctly. An example...

CREATE OR REPLACE FUNCTION test.test_parse (
    p_time_start timestamptz,
    p_time_end timestamptz,
    p_time_interval interval default NULL
) RETURNS TABLE (
    ts timestamptz,
    arbitrary_return bigint
) AS $$
BEGIN
    -- some comment
    -- some other comment

    IF p_time_interval IS NULL
        THEN p_time_interval := interval_from_start_end(p_time_start, p_time_end);
    END IF;
    RETURN QUERY
    SELECT
        bucket_function(p_time_interval, timestamp) AS ts,
        arbitrary_return
    FROM test.some_table
    WHERE
        start >= p_time_start
        AND end < p_time_end
    GROUP BY 1;
END; $$ LANGUAGE plpgsql SECURITY DEFINER PARALLEL UNSAFE;

Using parse_plpgsql fails:

>>> parse_plpgsql(raw_sql)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/x/.venv/lib/python3.9/site-packages/pglast/__init__.py", line 31, in parse_plpgsql
    return loads(parse_plpgsql_json(statement))
  File "pglast/parser.pyx", line 346, in pglast.parser.parse_plpgsql_json
pglast.parser.ParseError: "p_time_interval" is not a known variable

Using parse_sql here works, but yields a big blob unparsed:

>>> stmt_ast = parse_sql(raw_sql)
>>> stmt_ast[0].stmt.options[0].arg[0].val
'\nBEGIN\n    -- some comment\n    -- some other comment\n\n    IF p_time_interval IS NULL\n        THEN p_time_interval := interval_from_start_end(p_time_start, p_time_end);\n    END IF;\n    RETURN QUERY\n    SELECT\n        bucket_function(p_time_interval, timestamp) AS ts,\n        arbitrary_return\n    FROM test.some_table\n    WHERE\n        start >= p_time_start\n        AND end < p_time_end\n    GROUP BY 1;\nEND; '

Trying to parse that inner blob as plpgsql similarly does not work:

>>> parse_plpgsql(stmt_ast[0].stmt.options[0].arg[0].val)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/x/.venv/lib/python3.9/site-packages/pglast/__init__.py", line 31, in parse_plpgsql
    return loads(parse_plpgsql_json(statement))
  File "pglast/parser.pyx", line 346, in pglast.parser.parse_plpgsql_json
pglast.parser.ParseError: syntax error at or near "IF", at index 58

Any tips?

Further - any heuristic to not need to explicitly declare "this function should be parsed as SQL" versus "this function should be parsed as PL/pgSQL", or do I need to decide that per statement?

remingtonc avatar Aug 25 '21 21:08 remingtonc

Hi, unfortunately the plpgsql parser is not yet fully implemented, so I'm afraid it is of little utility as of now. I'm enjoying a short vacation so I cannot try your example: I will do once I get back.

lelit avatar Aug 26 '21 08:08 lelit

As the related issue is now closed, I will try to get at this soon.

lelit avatar Oct 09 '21 06:10 lelit

The parse error is now fixed in just released v3.6, but I'm leaving this open to remind me a possible clarification in the doc about other issues:

  • difference between the two main parse functions
  • make it more evident that the support for PLpgSQL is still vaporware

lelit avatar Oct 09 '21 08:10 lelit