datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Add `pivot` & `unpivot` (melt) to DataFrame

Open ion-elgreco opened this issue 1 year ago • 1 comments

Is your feature request related to a problem or challenge?

Pivoting and unpivoting is a common use case for data scientists, this is currently missing in the DF api.

Describe the solution you'd like

Add two methods to DataFrame struct:

  • pivot
    • possibility to add distinct values in advance (would be optimized into groupby.agg() directly)
    • let DataFusion calculate distinct values automatically
  • unpivot

Describe alternatives you've considered

No response

Additional context

No response

ion-elgreco avatar Oct 13 '24 11:10 ion-elgreco

referenced in https://github.com/apache/datafusion-python/issues/875

Omega359 avatar Oct 13 '24 13:10 Omega359

Out of curiosity, would it be possible to implement pivot using DataFusion's current extension mechanisms? I had hoped it would be possible as a UDTF, but it doesn't look like UDTFs can input a table.

jonmmease avatar Apr 07 '25 10:04 jonmmease

@jonmmease , not sure if it's helpful, but Ibis has a pure Python implementation that works on DataFusion dataframes:

https://github.com/ibis-project/ibis/blob/20bec137463d0dba71e741247624119b7a7b3452/ibis/expr/types/relations.py#L4005

riziles avatar Apr 07 '25 19:04 riziles

@chenkovsky has a PR to implement this:

  • https://github.com/apache/datafusion/pull/17946

It looks perhaps similar to the PIVOT from @simonvandel in

  • https://github.com/apache/datafusion/pull/17365

alamb avatar Oct 07 '25 19:10 alamb

@chenkovsky your PR seems better than my attempt, so let's go for that. I'll be happy to use the LogicalPlanBuilder::pivot from your PR in my pipe operator PR

simonvandel avatar Oct 07 '25 21:10 simonvandel