feat: allow math functions to parse input from Utf8
Is your feature request related to a problem or challenge?
log10('-Infinity'); log2('-Infinity'); power('Infinity',2)
As above, using the + infinity function in some math functions will report an error in arrow-datafusion, but mysql and posgtres will not report an error. I think the + infinity parameter is meaningful in mathematics.
--postgres
postgres=# select power('Infinity', -2);
power
-------
0
(1 row)postgres=# select log('Infinity', 2);
log
-----
0
(1 row)postgres=# select log10('Infinity');
log10
----------
Infinity
(1 row)postgres=# select power('Infinity', 2);
power
----------
Infinity
(1 row)
--mysql
mysql> select log10('+Infinity');
+--------------------+
| log10('+Infinity') |
+--------------------+
| NULL |
+--------------------+
1 row in set, 2 warnings (0.00 sec)mysql> select power('Infinity', 2);
+----------------------+
| power('Infinity', 2) |
+----------------------+
| 0 |
+----------------------+
1 row in set, 1 warning (0.00 sec)mysql> select power('-Infinity', 2);
+-----------------------+
| power('-Infinity', 2) |
+-----------------------+
| 0 |
+-----------------------+
1 row in set, 1 warning (0.00 sec)mysql> select log10('-Infinity');
+--------------------+
| log10('-Infinity') |
+--------------------+
| NULL |
+--------------------+
1 row in set, 2 warnings (0.00 sec)
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
@alamb Maybe I need some opinions from you. In my opinion, this is also a complicated issue.
Datafusion does actually support this it seems:
❯ select log(arrow_cast('+Infinity', 'Float32'));
+------------------------+
| log(Utf8("+Infinity")) |
+------------------------+
| inf |
+------------------------+
1 row in set. Query took 0.002 seconds.
❯ select log(arrow_cast('-Infinity', 'Float32'));
+------------------------+
| log(Utf8("-Infinity")) |
+------------------------+
| NaN |
+------------------------+
1 row in set. Query took 0.002 seconds.
❯
Could you elaborate on which math functions are throwing error in cases of infinity input?
Datafusion does actually support this it seems:
❯ select log(arrow_cast('+Infinity', 'Float32')); +------------------------+ | log(Utf8("+Infinity")) | +------------------------+ | inf | +------------------------+ 1 row in set. Query took 0.002 seconds. ❯ select log(arrow_cast('-Infinity', 'Float32')); +------------------------+ | log(Utf8("-Infinity")) | +------------------------+ | NaN | +------------------------+ 1 row in set. Query took 0.002 seconds. ❯Could you elaborate on which math functions are throwing error in cases of infinity input?
Thank you for your reply. It seems that you need to do some cast conversion. Can't you log2('-Infinity') directly?
Yeah that would seem reasonable. I guess it would involve changing the signatures of the math functions to also accept Utf8 data type, then try parse into float (or something like that)
I've updated the issue title to better reflect this :+1:
(Since less of an issue with infinity, and more just a general data type issue with math functions)
I've updated the issue title to better reflect this 👍
(Since less of an issue with infinity, and more just a general data type issue with math functions)
Very happy to see the way arrow-datafusion handles this, it's a way I would like to see, I think doing a cast is a very rigorous behavior
@Jefffrey But I think this cast should be an implicit conversion. If not, maybe we should indicate this in the tests and documentation.
@Jefffrey But I think this cast should be an implicit conversion. If not, maybe we should indicate this in the tests and documentation.
Yes, I think this is what I was trying to convey, that we should allow log('100.0') to be implicitly converted to log(100.0) since we can already cast from Utf8 to Float. I just might be wrong on the exact implementation details as I'm not as familiar with the internals of the function related code :slightly_smiling_face:
@Jefffrey But I think this cast should be an implicit conversion. If not, maybe we should indicate this in the tests and documentation.
Yes, I think this is what I was trying to convey, that we should allow
log('100.0')to be implicitly converted tolog(100.0)since we can already cast from Utf8 to Float. I just might be wrong on the exact implementation details as I'm not as familiar with the internals of the function related code 🙂
Maybe I want to try it, but I'm not sure I can pull it off
@Jefffrey I will try to add some tests first and then find the problem. At present, the math function should have some negative tests that need to be improved.
@Jefffrey But I think this cast should be an implicit conversion. If not, maybe we should indicate this in the tests and documentation.
Yes, I think this is what I was trying to convey, that we should allow
log('100.0')to be implicitly converted tolog(100.0)since we can already cast from Utf8 to Float. I just might be wrong on the exact implementation details as I'm not as familiar with the internals of the function related code 🙂
I think this process is called "coercion" in the datafusion codebase.
So we would need to add a coercion from DataType::Utf8 to float perhaps
I think this process is called "coercion" in the datafusion codebase.
So we would need to add a coercion from
DataType::Utf8to float perhaps
double or float?
So we would need to add a coercion from
DataType::Utf8to float perhapsdouble or float?
I think we would probably need the coercion for both Float32 and Float64
@alamb I have some ideas about type conversion. I am trying out my ideas on calcite. If it makes sense, I will transplant it to arrow-datafusion. This may take some time.
@alamb I have some ideas about type conversion. I am trying out my ideas on calcite. If it makes sense, I will transplant it to arrow-datafusion. This may take some time.
Thanks @caicancai -- BTW I find following the example of existing systems is helpful in cases like this (there is no need to reinvent different coercion rules)
For example, postgres appears to happily coerce UTF8 -> float:
postgres=# select sqrt('1.4');
sqrt
--------------------
1.1832159566199232
(1 row)