[Epic] Unify `WindowFunction` Interface (remove built in list of `BuiltInWindowFunction` s)
Is your feature request related to a problem or challenge?
For many of the same reasons as listed on https://github.com/apache/arrow-datafusion/issues/8045, having two types of aggregate functions ("built in" --BuiltInWindowFunction and WindowUDF is problematic for two reasons:
- There are some features that may not be available to User Defined Window Functions (such as reversing
FIRST_VALUEandLAST_VALUE) - Users can not easily choose which window functions to include (which will likely be especially problematic as we work to add more functions)
Describe the solution you'd like
I propose moving DataFusion to only use WindowURFs and remove BuiltInWindowFunction for the same reasons as https://github.com/apache/arrow-datafusion/issues/8045
We will keep the existing WindowUDF interface as much as possible, while also potentially providing an easier way to define them.
Describe alternatives you've considered
Additional context
Proposed implementation steps:
- [x] https://github.com/apache/arrow-datafusion/issues/8711
- [x] https://github.com/apache/arrow-datafusion/issues/8734
- [x] https://github.com/apache/arrow-datafusion/issues/9527
- [x] https://github.com/apache/datafusion/issues/12029
- [x] https://github.com/apache/datafusion/pull/11287
- [x] https://github.com/apache/datafusion/issues/12373
- [x] https://github.com/apache/datafusion/pull/12030
- [x] https://github.com/apache/datafusion/issues/12648
- [ ] https://github.com/apache/datafusion/issues/12649
- [ ] https://github.com/apache/datafusion/issues/12694
- [ ] https://github.com/apache/datafusion/issues/12695
- [x] https://github.com/apache/datafusion/issues/12802
I'd like to try a small POC and migrate ROW_NUMBER to WindowUDF trait
I'd like to try a small POC and migrate
ROW_NUMBERtoWindowUDFtrait
That would be awesome. Thank you
I recommend trying to put it in its own crate if possible (datfusion-window-functions perhaps?) but that doesn't have to be part of the POC
I'm going to move back to working on datafusion-python for a bit, but would like to work on this afterwards.
Also I think we have some functions that are implemented in aggregate that should be moved to window. Namely first_value and last_value but it's worth taking a look through all the functions to see where they are defined.
Notes (for myself) on how postgresql divides them:
- https://www.postgresql.org/docs/current/functions-aggregate.html
- https://www.postgresql.org/docs/current/functions-window.html
I filed a few more good first issues because of interest from new contributors,
Thank you @jcsherin -- I added it tothe list above
this has all tasks done 🚀 ok to close?
Sounds good!
There is one final small cleanup I think that could help. Filed it as
- https://github.com/apache/datafusion/issues/13473
Thanks for keeping this clean @findepi and @comphead