Add `pivot` & `unpivot` (melt) to DataFrame
Is your feature request related to a problem or challenge?
Pivoting and unpivoting is a common use case for data scientists, this is currently missing in the DF api.
Describe the solution you'd like
Add two methods to DataFrame struct:
- pivot
- possibility to add distinct values in advance (would be optimized into groupby.agg() directly)
- let DataFusion calculate distinct values automatically
- unpivot
Describe alternatives you've considered
No response
Additional context
No response
referenced in https://github.com/apache/datafusion-python/issues/875
Out of curiosity, would it be possible to implement pivot using DataFusion's current extension mechanisms? I had hoped it would be possible as a UDTF, but it doesn't look like UDTFs can input a table.
@jonmmease , not sure if it's helpful, but Ibis has a pure Python implementation that works on DataFusion dataframes:
https://github.com/ibis-project/ibis/blob/20bec137463d0dba71e741247624119b7a7b3452/ibis/expr/types/relations.py#L4005
@chenkovsky has a PR to implement this:
- https://github.com/apache/datafusion/pull/17946
It looks perhaps similar to the PIVOT from @simonvandel in
- https://github.com/apache/datafusion/pull/17365
@chenkovsky your PR seems better than my attempt, so let's go for that. I'll be happy to use the LogicalPlanBuilder::pivot from your PR in my pipe operator PR