datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Issue using `arrow_cast` in ORDER BY expressions

Open universalmind303 opened this issue 1 year ago • 9 comments

Describe the bug

originally opened in https://github.com/GlareDB/glaredb/issues/2597

It seems that arrow_cast's special handling makes it so that it errors out when using it in an ORDER BY expr

To Reproduce

statement ok
create table ts (a int);

statement ok
insert into ts values (915148800);

# this errors
statement ok 
SELECT
  DATE_TRUNC('minute', arrow_cast(a, 'Timestamp(Second, None)')) AS minute
FROM
  ts
ORDER BY
  DATE_TRUNC('minute', arrow_cast(a, 'Timestamp(Second, None)'));

Expected behavior

It should be functionally equivalent in this example to using to_timestamp_seconds

SELECT
  DATE_TRUNC('minute', to_timestamp_seconds(a)) AS minute
FROM
  ts
ORDER BY
  DATE_TRUNC('minute', to_timestamp_seconds(a));

Additional context

No response

universalmind303 avatar Feb 06 '24 18:02 universalmind303

I would like to work on this one.

brayanjuls avatar Feb 06 '24 22:02 brayanjuls

I actually began something with this fix, not finish it yet. Let me know if you are really interested in this. I can hold the fix.

viirya avatar Feb 06 '24 22:02 viirya

I wanted to work on it as an exercise to also learn the internals, if there is a hurry to solve it then go for it, if not then I would be glad to work on this.

brayanjuls avatar Feb 06 '24 22:02 brayanjuls

Okay, I think this is not urgent. I will hold the fix. If you don't have time to work this anymore later, please let me know. I will continue on this. Thank you.

viirya avatar Feb 06 '24 22:02 viirya

Note another approach might be to help https://github.com/apache/arrow-datafusion/pull/8985 (which would allow us to remove the special case handling of arrow_cast and make it a normal UDF)

alamb avatar Feb 07 '24 13:02 alamb

@alamb suggested a solution in that PR but I am still trying to understand where the arrow_cast UDF should be implemented after that PR is merged.

brayanjuls avatar Feb 09 '24 02:02 brayanjuls

@alamb suggested a solution in that PR but I am still trying to understand where the arrow_cast UDF should be implemented after that PR is merged.

We are talking about the organization on #9100 -- I am sorry it is not yet fully finalized or written down @brayanjuls

I would propose putting it in datafusion-functions/src/core. However, there is not yet an example of doing so, but I can prioritize doing it so you have an example to follow (or you can take a shot at it on your own, but there is a bit of macro magic that is necessary)

alamb avatar Feb 09 '24 20:02 alamb

I would propose putting it in datafusion-functions/src/core. However, there is not yet an example of doing so, but I can prioritize doing it so you have an example to follow (or you can take a shot at it on your own, but there is a bit of macro magic that is necessary)

I would prefer waiting for the examples as I have little experience creating rust declarative macros.

brayanjuls avatar Feb 13 '24 11:02 brayanjuls

I would prefer waiting for the examples as I have little experience creating rust declarative macros.

Makes sense -- I will try and get a PR shortly

alamb avatar Feb 13 '24 11:02 alamb