exposing the `fold` expressions from Polars
Description
Is it possible to expose the folds API from Polars?
I have a problem that I think can be solved via that API (I'm not entirely sure, still a beginner with Explorer).
I can try to think of a version of my problem that I can publicly if needed.
Also, if this API is already exposed and I just missed it... please let me know 😅.
Thanks!
It is not exposed (unless I missed it too!). I think it'd be a great addition. Though it looks like it'd be a good deal of work to add it so it might take a while.
I have a problem that I think can be solved via that API (I'm not entirely sure, still a beginner with Explorer). I can try to think of a version of my problem that I can publicly if needed.
If you want to ask on elixirforum.com, feel free to @- me and I can try to answer. My handle is the same as on GitHub.
Oh, I didn't know we had fold. It seems it works with expressions, which means we can use the structure in Explorer.QUery to fold over anything and it will be performant. I don't think it would be that complicated then! My suggestion is to call it reduce_with, to mirror it map_with and friends!
So it seems there's fold_exprs and reduce_exprs. The difference seems to be reduction col-wise vs. row-wise. I think we'd want to include both?
They also have a few exprs pairs like sum and sum_horizontal. Maybe we want to call them reduce_with and reduce_with_horizontal? reduce and fold are basically synonyms to me.
Also looking over the docs, I think there's a lot of potential in exposing many of their exprs:
- https://docs.pola.rs/docs/rust/dev/polars_lazy/dsl/index.html
Sorry, I got fold and reduce mixed up. If it is operating on the columns themselves, then we can probably add it to Explorer.Query directly. We already support column traversal via across/query.
I am more interested in the reduce version that works within a single column.
I am more interested in the reduce version that works within a single column.
Yeah agreed! It'd be super useful in summarise.
We already support column traversal via across/query.
If I'm reading this correctly (I've not confirmed it yet), then the reduce_with_horizontal reduces across the columns:
df = DF.new(a: [1, 2, 3], b: [10, 20, 30], c: [100, 200, 300])
+--------------------------------------------+
| Explorer DataFrame: [rows: 3, columns: 3] |
+--------------+--------------+--------------+
| a | b | c |
| <s64> | <s64> | <s64> |
+==============+==============+==============+
| 1 | 10 | 100 |
+--------------+--------------+--------------+
| 2 | 20 | 200 |
+--------------+--------------+--------------+
| 3 | 30 | 300 |
+--------------+--------------+--------------+
mutate(df, sum: reduce_horizontal(cols(), 0, fn col, acc ->
col + acc
end))
+-------------------------------------------+
| Explorer DataFrame: [rows: 3, columns: 4] |
+----------+----------+----------+----------+
| a | b | c | sum |
| <s64> | <s64> | <s64> | <s64> |
+==========+==========+==========+==========+
| 1 | 10 | 100 | 111 |
+----------+----------+----------+----------+
| 2 | 20 | 200 | 222 |
+----------+----------+----------+----------+
| 3 | 30 | 300 | 333 |
+----------+----------+----------+----------+
Our comprehensions only make the same call to mutate/filter/etc. with different columns more ergonomic. This would let you actually use compute multi-column things.
In fact, I wonder if we could make the :reduce option to for syntactic sugar for this?... 🤔
In fact, I wonder if we could make the :reduce option to for syntactic sugar for this?... 🤔
We certainly could but perhaps @cigrainger has ideas on the API for this. @cigrainger, can we "fold" across columns in dplyr?
The equivalent in dplyr would be accomplished with something like this:
df
|> mutate(sum(c_across(starts_with("Bud")))
It's kind of gross, but quite similar to mutate(df, sum: reduce_horizontal(...))
There used to be a rowwise() wrapper that also felt a bit off.
We constantly keep bumping into this issue of horizontal operations/aggregations.
While we are thinking about implementing fold, maybe there is a way to utilize existing code where we programatically construct Query out of a list of columns and an operator.
In the end what we are really trying to simulate is new_col: x + y + z expression by passing a list [x, y, z] and an operator :+
@vanjabucic I think a fold implementation is the right call long term. But if you post a minimal of what you're trying to accomplish, I think I can code up a workaround.
Also relevant:
- https://github.com/elixir-explorer/explorer/issues/978 - another instance of a someone wanting a dynamic mutation
- https://github.com/elixir-explorer/explorer/pull/989 - new functionality that may help