datafusion feat: allow math functions to parse input from Utf8

Is your feature request related to a problem or challenge?

log10('-Infinity'); log2('-Infinity'); power('Infinity',2)

As above, using the + infinity function in some math functions will report an error in arrow-datafusion, but mysql and posgtres will not report an error. I think the + infinity parameter is meaningful in mathematics.

--postgres
postgres=# select power('Infinity', -2);
 power 
-------
     0
(1 row)postgres=# select log('Infinity', 2);
 log 
-----
   0
(1 row)postgres=# select log10('Infinity');
  log10   
----------
 Infinity
(1 row)postgres=# select power('Infinity', 2);
  power   
----------
 Infinity
(1 row)

--mysql
mysql> select log10('+Infinity');
+--------------------+
| log10('+Infinity') |
+--------------------+
| NULL |
+--------------------+
1 row in set, 2 warnings (0.00 sec)mysql> select power('Infinity', 2);
+----------------------+
| power('Infinity', 2) |
+----------------------+
| 0 |
+----------------------+
1 row in set, 1 warning (0.00 sec)mysql> select power('-Infinity', 2);
+-----------------------+
| power('-Infinity', 2) |
+-----------------------+
| 0 |
+-----------------------+
1 row in set, 1 warning (0.00 sec)mysql> select log10('-Infinity');
+--------------------+
| log10('-Infinity') |
+--------------------+
| NULL |
+--------------------+
1 row in set, 2 warnings (0.00 sec)

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Feb 21 '24 08:02 caicancai

@alamb Maybe I need some opinions from you. In my opinion, this is also a complicated issue.

Feb 22 '24 01:02 caicancai

Datafusion does actually support this it seems:

❯ select log(arrow_cast('+Infinity', 'Float32'));
+------------------------+
| log(Utf8("+Infinity")) |
+------------------------+
| inf                    |
+------------------------+
1 row in set. Query took 0.002 seconds.

❯ select log(arrow_cast('-Infinity', 'Float32'));
+------------------------+
| log(Utf8("-Infinity")) |
+------------------------+
| NaN                    |
+------------------------+
1 row in set. Query took 0.002 seconds.

❯

Could you elaborate on which math functions are throwing error in cases of infinity input?

Feb 22 '24 11:02 Jefffrey

Datafusion does actually support this it seems:

❯ select log(arrow_cast('+Infinity', 'Float32'));
+------------------------+
| log(Utf8("+Infinity")) |
+------------------------+
| inf                    |
+------------------------+
1 row in set. Query took 0.002 seconds.

❯ select log(arrow_cast('-Infinity', 'Float32'));
+------------------------+
| log(Utf8("-Infinity")) |
+------------------------+
| NaN                    |
+------------------------+
1 row in set. Query took 0.002 seconds.

❯

Could you elaborate on which math functions are throwing error in cases of infinity input?

Thank you for your reply. It seems that you need to do some cast conversion. Can't you log2('-Infinity') directly?

Feb 22 '24 13:02 caicancai

Yeah that would seem reasonable. I guess it would involve changing the signatures of the math functions to also accept Utf8 data type, then try parse into float (or something like that)

Feb 22 '24 19:02 Jefffrey

I've updated the issue title to better reflect this :+1:

(Since less of an issue with infinity, and more just a general data type issue with math functions)

Feb 22 '24 20:02 Jefffrey

I've updated the issue title to better reflect this 👍

(Since less of an issue with infinity, and more just a general data type issue with math functions)

Very happy to see the way arrow-datafusion handles this, it's a way I would like to see, I think doing a cast is a very rigorous behavior

Feb 23 '24 00:02 caicancai

@Jefffrey But I think this cast should be an implicit conversion. If not, maybe we should indicate this in the tests and documentation.

Feb 23 '24 05:02 caicancai

@Jefffrey But I think this cast should be an implicit conversion. If not, maybe we should indicate this in the tests and documentation.

Yes, I think this is what I was trying to convey, that we should allow log('100.0') to be implicitly converted to log(100.0) since we can already cast from Utf8 to Float. I just might be wrong on the exact implementation details as I'm not as familiar with the internals of the function related code :slightly_smiling_face:

Feb 23 '24 08:02 Jefffrey

@Jefffrey But I think this cast should be an implicit conversion. If not, maybe we should indicate this in the tests and documentation.

Yes, I think this is what I was trying to convey, that we should allow log('100.0') to be implicitly converted to log(100.0) since we can already cast from Utf8 to Float. I just might be wrong on the exact implementation details as I'm not as familiar with the internals of the function related code 🙂

Maybe I want to try it, but I'm not sure I can pull it off

Feb 23 '24 08:02 caicancai

@Jefffrey I will try to add some tests first and then find the problem. At present, the math function should have some negative tests that need to be improved.

Feb 23 '24 08:02 caicancai

@Jefffrey But I think this cast should be an implicit conversion. If not, maybe we should indicate this in the tests and documentation.

Yes, I think this is what I was trying to convey, that we should allow log('100.0') to be implicitly converted to log(100.0) since we can already cast from Utf8 to Float. I just might be wrong on the exact implementation details as I'm not as familiar with the internals of the function related code 🙂

I think this process is called "coercion" in the datafusion codebase.

So we would need to add a coercion from DataType::Utf8 to float perhaps

Feb 26 '24 18:02 alamb

I think this process is called "coercion" in the datafusion codebase.

So we would need to add a coercion from DataType::Utf8 to float perhaps

double or float?

Feb 27 '24 07:02 caicancai

So we would need to add a coercion from DataType::Utf8 to float perhaps

double or float?

I think we would probably need the coercion for both Float32 and Float64

Feb 27 '24 13:02 alamb

@alamb I have some ideas about type conversion. I am trying out my ideas on calcite. If it makes sense, I will transplant it to arrow-datafusion. This may take some time.

Mar 01 '24 14:03 caicancai

@alamb I have some ideas about type conversion. I am trying out my ideas on calcite. If it makes sense, I will transplant it to arrow-datafusion. This may take some time.

Thanks @caicancai -- BTW I find following the example of existing systems is helpful in cases like this (there is no need to reinvent different coercion rules)

For example, postgres appears to happily coerce UTF8 -> float:

postgres=# select sqrt('1.4');
        sqrt
--------------------
 1.1832159566199232
(1 row)

Mar 01 '24 17:03 alamb