Blog post with DataFusion Jan - June 2024
Is your feature request related to a problem or challenge?
We have had good luck writing up quarterly updates for DataFusion, most recently: https://arrow.apache.org/blog/2024/01/19/datafusion-34.0.0/
(see https://github.com/apache/arrow-datafusion/issues/6780)
Describe the solution you'd like
Write a blog post
Describe alternatives you've considered
No response
Additional context
No response
I am starting to collect a list of things to highlight here
- New trait based APIs
- Pluggable handler for CREATE FUNCTIONL https://github.com/apache/arrow-datafusion/pull/9333 (thanks @milenkovicm)
- DataFusion Comet blog published: https://arrow.apache.org/blog/2024/03/06/comet-donation/
- Move of DataFusion to a top level Apache project was approved by the community: https://github.com/apache/arrow-datafusion/discussions/6475
- Large scale "extract scalar functions from the core" continues at good pace way https://github.com/apache/arrow-datafusion/issues/9285
Performance improvements:
-
specialized group values for strings/binary https://github.com/apache/arrow-datafusion/pull/8827
-
Meetup
-
Agenda for DataFusion meetup 2024 is looking good https://github.com/apache/arrow-datafusion/discussions/8522
-
DataFusion SIGMOD paper about DataFusion was https://github.com/apache/arrow-datafusion/issues/8373#issuecomment-1925913783
SQL to String features from @devinjdangelo / @backkem
- https://github.com/apache/arrow-datafusion/issues/9494
Maybe REcursive CTEs:
- Hardening Recursive CTEs https://github.com/apache/arrow-datafusion/issues/462 with @matthewgapp and @jonahgao
CTE support: https://github.com/apache/arrow-datafusion/pull/9619
Just a few things that come to mind:
Functions:
- New: nv1, nvl2
- Improvements: unnest, null handling improvements for lead/lag
WASM from @waynexia https://github.com/apache/arrow-datafusion/discussions/9834
I am now officially out of time and excuses -- I need to write this post soon
Started gathering ideas https://github.com/apache/datafusion-site/pull/6
I plan one more round of copyediting and then posting in 2 days. Please leave comments if you have any: https://github.com/apache/datafusion-site/pull/6
Blog post is live: https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
Filed https://github.com/apache/datafusion/issues/11631 to track the next one