Incorrect behavior of arithmetic operations between time values
Describe the bug
Consider the following query:
DataFusion CLI v41.0.0
> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' - interval '1 second' - interval '1 day');
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 1000000000 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-29T08:21:26 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.
While the expected result should be 2024-08-27 08:21:26.000000 .
When I add parentheses to the query, the results are what I expect:
> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' - interval '1 second') - interval '1 day';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 1000000000 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-27T08:21:26 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.
The issue seems to happening when you have multiple (more than 2?) operands in arithmetic. e.g.:
> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' - interval '1 day' - interval '2 day');
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-29T08:21:27 -- Should be 2024-08-25 08:21:27.000000 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
When you have two operands, the results seem to match what you'd expect:
> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' - interval '2 day');
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-26T08:21:27 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.
To Reproduce
You can use the same queries mentioned above.
Expected behavior
Mentioned above as well
Additional context
No response
I don't know the root cause, but I'd just add it seems to to where there's subtraction. E.g.
> select interval '1 day' + interval '2 day' + interval '1 day' as i;
+-------------------------------------------------------+
| i |
+-------------------------------------------------------+
| 0 years 0 mons 4 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+
works, but
> select interval '1 day' - interval '2 day' - interval '1 day' as i;
+-------------------------------------------------------+
| i |
+-------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.
doesn't. I wasn't able to find a purely add case that doesn't produce the correct answer (though doesn't necessarily rule it out).
Just an educated guess, but could it be that the arithmetic operators are not passed correctly from the parsed AST, or that the AST is not constructed correctly? Not entirely sure, but it seems that the first arithmetic operator is always the one that is applied on the entire expression. For example:
> select interval '2 day' - interval '1 day' + interval '1 day';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Minus is applied on all of them. I think this is the result of select interval '2 day' - interval '1 day' - interval '1 day';.
Another example:
> select interval '2 day' - interval '1 day' + interval '1 day' + interval '5 day' - interval '3 day';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 3, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons -2 days 0 hours 0 mins 0.000000000 secs |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.
which seems to be of similar behavior. The minus is applied on all of them, even transforming the latest minus into a plus. I think that's the result of select interval '2 day' - interval '1 day' - interval '1 day' - interval '5 day' + interval '3 day';
Another weird example with a plus:
> select interval '5 day' + interval '1 day' - interval '2 day' - interval '4 day';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 4, nanoseconds: 0 }") |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 8 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
The results seem to correspond with select interval '5 day' + interval '1 day' - interval '2 day' + interval '4 day';
Just found something that could be interesting. I think it's a bug in the AST and the ~~precedence~~ order of operators.
I tried explicitly defining the precedence/order with parentheses for the same queries above (trying to mimick how the precedence between the operators is preserved): Precedence from right to left (which is not correct)
select interval '2 day' - (interval '1 day' + (interval '1 day' + (interval '5 day' - ( interval '3 day'))));
select interval '2 day' - (interval '1 day' + (interval '1 day'));
select interval '5 day' + (interval '1 day' - (interval '2 day' - (interval '4 day')));
This corresponds with the wrong results that I get from Datafusion:
DataFusion CLI v41.0.0
> select interval '2 day' - (interval '1 day' + (interval '1 day' + (interval '5 day' - ( interval '3 day'))));
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 3, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons -2 days 0 hours 0 mins 0.000000000 secs |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.
> select interval '2 day' - (interval '1 day' + (interval '1 day'));
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
> select interval '5 day' + (interval '1 day' - (interval '2 day' - (interval '4 day')));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 4, nanoseconds: 0 }") |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 8 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
Precdence from left to right (which is what I think is what should be happening):
select (((interval '2 day' - interval '1 day') + interval '1 day') + interval '5 day') - interval '3 day';
select (interval '2 day' - interval '1 day') + interval '1 day';
select ((interval '5 day' + interval '1 day') - interval '2 day') - interval '4 day';
The results of these queries match the expected results:
> select (((interval '2 day' - interval '1 day') + interval '1 day') + interval '5 day') - interval '3 day';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 3, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 4 days 0 hours 0 mins 0.000000000 secs |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.
> select (interval '2 day' - interval '1 day') + interval '1 day';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 2 days 0 hours 0 mins 0.000000000 secs |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
> select ((interval '5 day' + interval '1 day') - interval '2 day') - interval '4 day';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 4, nanoseconds: 0 }") |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
I think the examples have select (interval '2 day' - interval '1 day') + interval '1 day'; backwards?
> select interval '2 day' - (interval '1 day' + (interval '1 day'));
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.
> select (interval '2 day' - interval '1 day') + interval '1 day';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 2 days 0 hours 0 mins 0.000000000 secs |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.
Otherwise, there's a potentially related example: https://github.com/sqlparser-rs/sqlparser-rs/issues/1345
Just found something that could be interesting. I think it's a bug in the AST and the ~precedence~ order of operators.
Indeed, this is parsing problem.
I modified datafusion/sql/tests/cases/plan_to_sql.rs to include this:
--- datafusion/sql/tests/cases/plan_to_sql.rs
+++ datafusion/sql/tests/cases/plan_to_sql.rs
@@ -54,6 +54,12 @@ fn roundtrip_expr() {
"sum((age * 2))",
r#"sum((age * 2))"#,
),
+ (
+ TableReference::bare("person"),
+ "'2024-01-10 01:23:45'::timestamp - interval '1 day' - interval '2 day'",
+ // FIXME this is incorrect!
+ "(CAST('2024-01-10 01:23:45' AS TIMESTAMP) - (INTERVAL '0 YEARS 0 MONS 1 DAYS 0 HOURS 0 MINS 0.000000000 SECS' - INTERVAL '0 YEARS 0 MONS 2 DAYS 0 HOURS 0 MINS 0.000000000 SECS'))",
+ ),
];
let roundtrip = |table, sql: &str| -> Result<String> {
DFParser just wraps sqlparser::parser::Parser, just we will need to fix this in https://github.com/sqlparser-rs/sqlparser-rs
I think https://github.com/sqlparser-rs/sqlparser-rs/pull/1398 fixed this issue.
thank you @samuelcolvin and @alamb
Nice! I expect the next release of sqlparser to be released sometime this week and thus the change will be available in DataFusion as well: https://github.com/sqlparser-rs/sqlparser-rs/issues/1384
sqlparser-rs 0.51.0 is now available on crates.io: https://crates.io/crates/sqlparser/0.51.0 🎉
I believe this will be fixed by https://github.com/apache/datafusion/pull/12222 from @samuelcolvin