datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Incorrect behavior of arithmetic operations between time values

Open Abdullahsab3 opened this issue 1 year ago • 2 comments

Describe the bug

Consider the following query:

DataFusion CLI v41.0.0
> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' -  interval '1 second' - interval '1 day');
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 1000000000 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-29T08:21:26                                                                                                                                                                                                                                                                                     |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.

While the expected result should be 2024-08-27 08:21:26.000000 .

When I add parentheses to the query, the results are what I expect:

> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' -  interval '1 second') - interval '1 day';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 1000000000 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-27T08:21:26                                                                                                                                                                                                                                                                                     |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

The issue seems to happening when you have multiple (more than 2?) operands in arithmetic. e.g.:

> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' -  interval '1 day' - interval '2 day');
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-29T08:21:27                          -- Should be     2024-08-25 08:21:27.000000                                                                                                                                                                                                                              |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

When you have two operands, the results seem to match what you'd expect:

> select ('2024-08-27T08:21:27Z'::timestamp + interval '1 day' - interval '2 day');
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Utf8("2024-08-27T08:21:27Z") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-08-26T08:21:27                                                                                                                                                                                      |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

To Reproduce

You can use the same queries mentioned above.

Expected behavior

Mentioned above as well

Additional context

No response

Abdullahsab3 avatar Aug 27 '24 08:08 Abdullahsab3

I don't know the root cause, but I'd just add it seems to to where there's subtraction. E.g.

> select interval '1 day' + interval '2 day' + interval '1 day' as i;
+-------------------------------------------------------+
| i                                                     |
+-------------------------------------------------------+
| 0 years 0 mons 4 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+

works, but

> select interval '1 day' - interval '2 day' - interval '1 day' as i;
+-------------------------------------------------------+
| i                                                     |
+-------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.007 seconds.

doesn't. I wasn't able to find a purely add case that doesn't produce the correct answer (though doesn't necessarily rule it out).

tshauck avatar Aug 29 '24 20:08 tshauck

Just an educated guess, but could it be that the arithmetic operators are not passed correctly from the parsed AST, or that the AST is not constructed correctly? Not entirely sure, but it seems that the first arithmetic operator is always the one that is applied on the entire expression. For example:

> select interval '2 day' - interval '1 day' + interval '1 day';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Minus is applied on all of them. I think this is the result of select interval '2 day' - interval '1 day' - interval '1 day';.

Another example:

> select interval '2 day' - interval '1 day' + interval '1 day' + interval '5 day' - interval '3 day';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 3, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons -2 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                                                                                                      |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

which seems to be of similar behavior. The minus is applied on all of them, even transforming the latest minus into a plus. I think that's the result of select interval '2 day' - interval '1 day' - interval '1 day' - interval '5 day' + interval '3 day';

Another weird example with a plus:

> select interval '5 day' + interval '1 day' - interval '2 day' - interval '4 day';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 4, nanoseconds: 0 }") |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 8 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

The results seem to correspond with select interval '5 day' + interval '1 day' - interval '2 day' + interval '4 day';

Abdullahsab3 avatar Aug 30 '24 06:08 Abdullahsab3

Just found something that could be interesting. I think it's a bug in the AST and the ~~precedence~~ order of operators.

I tried explicitly defining the precedence/order with parentheses for the same queries above (trying to mimick how the precedence between the operators is preserved): Precedence from right to left (which is not correct)

select interval '2 day' - (interval '1 day' + (interval '1 day' + (interval '5 day' - ( interval '3 day'))));
select interval '2 day' - (interval '1 day' + (interval '1 day'));
select interval '5 day' + (interval '1 day' - (interval '2 day' - (interval '4 day')));

This corresponds with the wrong results that I get from Datafusion:

DataFusion CLI v41.0.0
> select interval '2 day' - (interval '1 day' + (interval '1 day' + (interval '5 day' - ( interval '3 day'))));
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 3, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons -2 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                                                                                                      |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.

> select interval '2 day' - (interval '1 day' + (interval '1 day'));
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

> select interval '5 day' + (interval '1 day' - (interval '2 day' - (interval '4 day')));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 4, nanoseconds: 0 }") |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 8 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

Precdence from left to right (which is what I think is what should be happening):

select (((interval '2 day' - interval '1 day') + interval '1 day') + interval '5 day') -  interval '3 day';
select (interval '2 day' - interval '1 day') + interval '1 day';
select ((interval '5 day' + interval '1 day') - interval '2 day') - interval '4 day';

The results of these queries match the expected results:

> select (((interval '2 day' - interval '1 day') + interval '1 day') + interval '5 day') -  interval '3 day';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 3, nanoseconds: 0 }") |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 4 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                                                                                                       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.

> select (interval '2 day' - interval '1 day') + interval '1 day';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 2 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

> select ((interval '5 day' + interval '1 day') - interval '2 day') - interval '4 day';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 5, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 4, nanoseconds: 0 }") |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                                                                                                                 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

Abdullahsab3 avatar Aug 30 '24 06:08 Abdullahsab3

I think the examples have select (interval '2 day' - interval '1 day') + interval '1 day'; backwards?

> select interval '2 day' - (interval '1 day' + (interval '1 day'));
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.003 seconds.

> select (interval '2 day' - interval '1 day') + interval '1 day';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 2, nanoseconds: 0 }") - IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") + IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 1, nanoseconds: 0 }") |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 2 days 0 hours 0 mins 0.000000000 secs                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.007 seconds.

Otherwise, there's a potentially related example: https://github.com/sqlparser-rs/sqlparser-rs/issues/1345

tshauck avatar Aug 30 '24 19:08 tshauck

Just found something that could be interesting. I think it's a bug in the AST and the ~precedence~ order of operators.

Indeed, this is parsing problem.

I modified datafusion/sql/tests/cases/plan_to_sql.rs to include this:

--- datafusion/sql/tests/cases/plan_to_sql.rs
+++ datafusion/sql/tests/cases/plan_to_sql.rs
@@ -54,6 +54,12 @@ fn roundtrip_expr() {
             "sum((age * 2))",
             r#"sum((age * 2))"#,
         ),
+        (
+            TableReference::bare("person"),
+            "'2024-01-10 01:23:45'::timestamp - interval '1 day' - interval '2 day'",
+            // FIXME this is incorrect!
+            "(CAST('2024-01-10 01:23:45' AS TIMESTAMP) - (INTERVAL '0 YEARS 0 MONS 1 DAYS 0 HOURS 0 MINS 0.000000000 SECS' - INTERVAL '0 YEARS 0 MONS 2 DAYS 0 HOURS 0 MINS 0.000000000 SECS'))",
+        ),
     ];

     let roundtrip = |table, sql: &str| -> Result<String> {

DFParser just wraps sqlparser::parser::Parser, just we will need to fix this in https://github.com/sqlparser-rs/sqlparser-rs

findepi avatar Sep 05 '24 12:09 findepi

I think https://github.com/sqlparser-rs/sqlparser-rs/pull/1398 fixed this issue.

Abdullahsab3 avatar Sep 06 '24 17:09 Abdullahsab3

thank you @samuelcolvin and @alamb

findepi avatar Sep 06 '24 18:09 findepi

Nice! I expect the next release of sqlparser to be released sometime this week and thus the change will be available in DataFusion as well: https://github.com/sqlparser-rs/sqlparser-rs/issues/1384

alamb avatar Sep 09 '24 13:09 alamb

sqlparser-rs 0.51.0 is now available on crates.io: https://crates.io/crates/sqlparser/0.51.0 🎉

alamb avatar Sep 11 '24 18:09 alamb

I believe this will be fixed by https://github.com/apache/datafusion/pull/12222 from @samuelcolvin

alamb avatar Sep 12 '24 19:09 alamb