Sorting an empty DataFrame results in a runtime Polars error
Attempting to sort a dataframe with groups and no values results in a runtime error
dataframe = DataFrame.new(a: ["a", "b", "c"])
dataframe
|> DataFrame.group_by("a")
|> DataFrame.filter(a == "d")
|> DataFrame.sort_by(a)
Output:
** (RuntimeError) Polars Error: cannot group_by + apply on empty 'DataFrame'
(explorer 0.8.2) lib/explorer/polars_backend/shared.ex:79: Explorer.PolarsBackend.Shared.apply_dataframe/4
#cell:2m6ajrb7ypgepmrw:3: (file)
Thanks for the issue!
It appears this may have been an issue on the Polars side that they addressed:
- https://github.com/pola-rs/polars/issues/12194
But that fix was released as part of Polars 0.35 (PR 12269):
- https://github.com/pola-rs/polars/releases/tag/rs-0.35.0
We've got a later version of Polars, so I'll have to do some more digging later.
It could be something related with the order of the chained expressions;
# ❗ doesn't work like mentioned in the issue.
df = DF.new(a: ["a", "b", "c"])
|> DF.group_by("a")
|> DF.filter(a == "d")
|> DF.sort_by(a)
# ✔️ This one works
df |> DF.filter(a == "d") |> DF.sort_by(a) |> DF.group_by("a")
I usually crosscheck with the python api. so; In the latest version of the api this doesn't work either.
df.group_by("a").filter(pl.lit("a").eq("d")).sort("a")
So my conclusion is, the order of the expressions are important.
@ceyhunkerti I think this should still be permitted. For example:
import Explorer.DataFrame
require Explorer.DataFrame
df = new(a: ["a", "a", "b"])
# Broken
df |> group_by("a") |> filter(a == "d") |> sort_by(a)
# Works
df |> lazy |> group_by("a") |> filter(a == "d") |> sort_by(a) |> compute
# #Explorer.DataFrame<
# Polars[0 x 1]
# Groups: ["a"]
# a string []
# >
AFAICT Polars group_by works a little differently. I believe they require aggregating before continuing work in most cases:
df.group_by("a").filter(False)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# AttributeError: 'GroupBy' object has no attribute 'filter'