Transform is not working on 0.3.x
Mix.install([{:explorer, "0.3.1"}])
Explorer.DataFrame.new(a: ["a", "b"], b: [1, 2])
|> Explorer.DataFrame.mutate_with(&[c: Explorer.Series.transform(&1[:b], fn p -> p end)])
** (RuntimeError) cannot perform operation on an Explorer.Backend.LazySeries
(explorer 0.3.1) lib/explorer/backend/lazy_series.ex:451: Explorer.Backend.LazySeries.transform/2
(explorer 0.3.1) lib/explorer/series.ex:2093: Explorer.Series.transform/2
(stdlib 3.17) erl_eval.erl:685: :erl_eval.do_apply/6
(stdlib 3.17) erl_eval.erl:893: :erl_eval.expr_list/6
(stdlib 3.17) erl_eval.erl:237: :erl_eval.expr/5
(stdlib 3.17) erl_eval.erl:229: :erl_eval.expr/5
(explorer 0.3.1) lib/explorer/data_frame.ex:1398: Explorer.DataFrame.mutate_with/2
Correct. mutate_with now performs a lazy operation and we cannot perform a transform lazily. I think this may work:
df = Explorer.DataFrame.new(a: ["a", "b"], b: [1, 2])
c = Explorer.Series.transform(df[:b], fn p -> p end)
Explorer.DataFrame.mutate(c: c)
However, this will stop working on v0.4. We could allow it to work but it means people can accidentally write eager operations when they should be lazy. I think we need to introduce a specific API for replacing one or more columns in a dataframe. @cigrainger, do you have any suggestions? I can think of two:
-
Explorer.DataFrame.replace(c: Explorer.Series.transform(df[:b], fn p -> p end))- it works pretty much as mutate today, but it is eager -
Explorer.DataFrame.put(df, :c, Explorer.Series.transform(df[:b], fn p -> p end))- since we already implement the Access protocol
@cigrainger / @philss / @kimjoaoun thoughts?
Similar problem.
How can I know which operations are available to be performed lazily?
Following https://hexdocs.pm/explorer/Explorer.DataFrame.html#mutate_with/2,
This function is similar to mutate/2, but allows complex operations to be performed, since it uses a virtual representation of the dataframe. The only requirement is that a series operation is returned.
I didn't get the meaning of The only requirement is that a series operation is returned.
df = Explorer.DataFrame.new(%{a: [1, 2], b: [3, 4]})
df
|> Explorer.DataFrame.mutate_with(&%{
ab: Explorer.Series.concat(&1[:a], &1[:b])
})
** (RuntimeError) cannot perform operation on an Explorer.Backend.LazySeries
(explorer 0.3.1) lib/explorer/backend/lazy_series.ex:451: Explorer.Backend.LazySeries.concat/2
(elixir 1.14.0) lib/enum.ex:2468: Enum."-reduce/3-lists^foldl/2-0-"/3
/Users/json/workspace/project/unus/carrier_umbrella/notebooks/data_transform_poc.livemd#cell:2fjiiwzf3zehfshdx7ui2o7pdb6bthrq:5: (file)
/Users/json/workspace/project/unus/carrier_umbrella/notebooks/data_transform_poc.livemd#cell:2fjiiwzf3zehfshdx7ui2o7pdb6bthrq:4: (file)
Oh, it is just not implemented!
https://github.com/elixir-nx/explorer/blob/main/lib/explorer/backend/lazy_series.ex#L450
# The following functions are not implemented yet and should raise if used.
funs = [
{:concat, 2},
{:fetch!, 2},
{:mask, 2},
{:from_list, 2},
{:sample, 4},
{:size, 1},
{:slice, 2},
{:take_every, 2},
{:to_enum, 1},
{:to_list, 1},
{:transform, 2}
]
Yes, I improved the error message. It is not implemented yet, a PR is welcome!
@josevalim I'm trying to implement Explorer.Series.concat/2 with reference to Explorer.Series.coalesce/2.
#366
I have one question.
Should LazySeries be able to operate only with LazySeries? (Series + LazySeries => Series(eager) is not allowed?)
For example, Explorer.Series.coalesce/2 doesn't allow operation with Series and LazySeries.
df = Explorer.DataFrame.new(%{a: [1, nil, 3]})
df
|> Explorer.DataFrame.mutate_with(&%{b:
Explorer.Series.coalesce(Explorer.Series.from_list([1, nil, 3]), &1[:a])
})
** (ErlangError) Erlang error: :invalid_struct
(explorer 0.3.1) Explorer.PolarsBackend.Native.s_coalesce(shape: (3,)
Series: '' [i64]
[
1
null
3
], %Explorer.Backend.LazySeries{op: :column, args: ["a"], aggregation: false, window: false})
(explorer 0.3.1) lib/explorer/polars_backend/shared.ex:17: Explorer.PolarsBackend.Shared.apply_series/3
#cell:pfpfgga4zpyttonfkjhmgxyguftlfsbl:4: (file)
#cell:pfpfgga4zpyttonfkjhmgxyguftlfsbl:3: (file)
I think concat can work on non lazy series too, similar to how addition works, but the result must always be a lazy series.
Oh, I'm getting to understand the concept of lazy series. Thanks!
How about closing it and following up on this in #381?
I think this is a separate problem. We can't really support transform for lazy series. :)
Just to answer the question, I think the second option looks better:
Explorer.DataFrame.put(df, :c, Explorer.Series.transform(df[:b], fn p -> p end))
With put you can add or replace a column.
Closing in favor of #414.