explorer icon indicating copy to clipboard operation
explorer copied to clipboard

exposing the `fold` expressions from Polars

Open mhanberg opened this issue 1 year ago • 9 comments

Description

Is it possible to expose the folds API from Polars?

I have a problem that I think can be solved via that API (I'm not entirely sure, still a beginner with Explorer).

I can try to think of a version of my problem that I can publicly if needed.

Also, if this API is already exposed and I just missed it... please let me know 😅.

Thanks!

mhanberg avatar May 20 '24 18:05 mhanberg

It is not exposed (unless I missed it too!). I think it'd be a great addition. Though it looks like it'd be a good deal of work to add it so it might take a while.

I have a problem that I think can be solved via that API (I'm not entirely sure, still a beginner with Explorer). I can try to think of a version of my problem that I can publicly if needed.

If you want to ask on elixirforum.com, feel free to @- me and I can try to answer. My handle is the same as on GitHub.

billylanchantin avatar May 20 '24 18:05 billylanchantin

Oh, I didn't know we had fold. It seems it works with expressions, which means we can use the structure in Explorer.QUery to fold over anything and it will be performant. I don't think it would be that complicated then! My suggestion is to call it reduce_with, to mirror it map_with and friends!

josevalim avatar May 20 '24 19:05 josevalim

So it seems there's fold_exprs and reduce_exprs. The difference seems to be reduction col-wise vs. row-wise. I think we'd want to include both?

They also have a few exprs pairs like sum and sum_horizontal. Maybe we want to call them reduce_with and reduce_with_horizontal? reduce and fold are basically synonyms to me.

Also looking over the docs, I think there's a lot of potential in exposing many of their exprs:

  • https://docs.pola.rs/docs/rust/dev/polars_lazy/dsl/index.html

billylanchantin avatar May 20 '24 20:05 billylanchantin

Sorry, I got fold and reduce mixed up. If it is operating on the columns themselves, then we can probably add it to Explorer.Query directly. We already support column traversal via across/query.

I am more interested in the reduce version that works within a single column.

josevalim avatar May 20 '24 20:05 josevalim

I am more interested in the reduce version that works within a single column.

Yeah agreed! It'd be super useful in summarise.

We already support column traversal via across/query.

If I'm reading this correctly (I've not confirmed it yet), then the reduce_with_horizontal reduces across the columns:

df = DF.new(a: [1, 2, 3], b: [10, 20, 30], c: [100, 200, 300])

+--------------------------------------------+
| Explorer DataFrame: [rows: 3, columns: 3]  |
+--------------+--------------+--------------+
|      a       |      b       |      c       |
|    <s64>     |    <s64>     |    <s64>     |
+==============+==============+==============+
| 1            | 10           | 100          |
+--------------+--------------+--------------+
| 2            | 20           | 200          |
+--------------+--------------+--------------+
| 3            | 30           | 300          |
+--------------+--------------+--------------+

mutate(df, sum: reduce_horizontal(cols(), 0, fn col, acc ->
  col + acc
end))

+-------------------------------------------+
| Explorer DataFrame: [rows: 3, columns: 4] |
+----------+----------+----------+----------+
|    a     |    b     |    c     |   sum    |
|  <s64>   |  <s64>   |  <s64>   |  <s64>   |
+==========+==========+==========+==========+
| 1        | 10       | 100      | 111      |
+----------+----------+----------+----------+
| 2        | 20       | 200      | 222      |
+----------+----------+----------+----------+
| 3        | 30       | 300      | 333      |
+----------+----------+----------+----------+

Our comprehensions only make the same call to mutate/filter/etc. with different columns more ergonomic. This would let you actually use compute multi-column things.

In fact, I wonder if we could make the :reduce option to for syntactic sugar for this?... 🤔

billylanchantin avatar May 20 '24 20:05 billylanchantin

In fact, I wonder if we could make the :reduce option to for syntactic sugar for this?... 🤔

We certainly could but perhaps @cigrainger has ideas on the API for this. @cigrainger, can we "fold" across columns in dplyr?

josevalim avatar May 20 '24 20:05 josevalim

The equivalent in dplyr would be accomplished with something like this:

df
|> mutate(sum(c_across(starts_with("Bud")))

It's kind of gross, but quite similar to mutate(df, sum: reduce_horizontal(...))

There used to be a rowwise() wrapper that also felt a bit off.

jsonbecker avatar Jun 28 '24 21:06 jsonbecker

We constantly keep bumping into this issue of horizontal operations/aggregations.

While we are thinking about implementing fold, maybe there is a way to utilize existing code where we programatically construct Query out of a list of columns and an operator.

In the end what we are really trying to simulate is new_col: x + y + z expression by passing a list [x, y, z] and an operator :+

vanjabucic avatar Mar 07 '25 21:03 vanjabucic

@vanjabucic I think a fold implementation is the right call long term. But if you post a minimal of what you're trying to accomplish, I think I can code up a workaround.

Also relevant:

  • https://github.com/elixir-explorer/explorer/issues/978 - another instance of a someone wanting a dynamic mutation
  • https://github.com/elixir-explorer/explorer/pull/989 - new functionality that may help

billylanchantin avatar Mar 07 '25 22:03 billylanchantin