Order of Interval Addition Should Affect Final Output
Describe the bug
In various engines, the order in which intervals are added to dates can affect the final value. This is especially noticeable with leap years.
Datafusion appears to constant fold these intervals, which throws away the operation order.
> EXPLAIN SELECT
DATE '2019-02-28' + INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB,
DATE '2019-02-28' + INTERVAL '1 DAY' + INTERVAL '1 YEAR' AS MAR;
+---------------+----------------------------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------------------------+
| logical_plan | Projection: Date32("2020-02-29") AS feb, Date32("2020-02-29") AS mar |
| | EmptyRelation |
| physical_plan | ProjectionExec: expr=[2020-02-29 as feb, 2020-02-29 as mar] |
| | PlaceholderRowExec |
| | |
+---------------+----------------------------------------------------------------------+
To Reproduce
Testing via datafusion-cli
> SELECT
DATE '2019-02-28' + INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB,
DATE '2019-02-28' + INTERVAL '1 DAY' + INTERVAL '1 YEAR' AS MAR;
+------------+------------+
| feb | mar |
+------------+------------+
| 2020-02-29 | 2020-02-29 |
+------------+------------+
Expected behavior
Due to leap year shenanigans, adding the year before the day results in a different date than adding the day before the year.
Trino emits
trino> SELECT
-> DATE '2019-02-28' + INTERVAL '1' YEAR + INTERVAL '1' DAY AS FEB,
-> DATE '2019-02-28' + INTERVAL '1' DAY + INTERVAL '1' YEAR AS MAR;
FEB | MAR
------------+------------
2020-02-29 | 2020-03-01
(1 row)
as does Postgres and Snowflake (based on their documentation for interval types where this example came from)
Additional context
No response
🤔 maybe we could turn of constant folding for adding intervals
take
This is caused by the adding ordering since the sqlparser would always generate a AST like
by only disabling constant folding for Interval would not help here since when we evaluate the PhysicalExpr, it would still do the calculation of the later two components first. I think a change in sqlparser would help? (like we generate the BinaryOP AST with first two components as left leaf and the last one as right leaf) not sure the best way to do it.
I agree with @Lordworms the root cause is related to the precidence rules in sqlparser which control if expressions like
DATE '2019-02-28' + INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB,
are parsed like this
((DATE '2019-02-28' + INTERVAL '1 YEAR') + INTERVAL '1 DAY' AS FEB),
Or like
(DATE '2019-02-28' + (INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB)),
Users who want to control the order of expression evaluation can use ( and ) to explicitly control the evaluation order.
I agree with @Lordworms the root cause is related to the precidence rules in sqlparser which control if expressions like
DATE '2019-02-28' + INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB,are parsed like this
((DATE '2019-02-28' + INTERVAL '1 YEAR') + INTERVAL '1 DAY' AS FEB),Or like
(DATE '2019-02-28' + (INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB)),Users who want to control the order of expression evaluation can use
(and)to explicitly control the evaluation order.
Thanks so much. Should I fix it in sqlparser?
Thanks so much. Should I fix it in sqlparser?
I think that is probably the best way -- though it is likely pretty tricky (likely related to operator precidence). I think the correct first step would be to do some research and see how such arithmetic is parsed by other databases