Eduard Karacharov
Eduard Karacharov
Disregarding IEJoin -- `time` output from the issue description seems to show that both DuckDB and DF spend +- same cputime (user + system) and the only difference is parallelism...
> the old paralleism strategy should works, but the check in enforce_distribution.rs block the reparition I don't think it's proper way to go -- it'll give some benefits in terms...
My intention was to fix NLJoin parallelism issue due to fixed build-side choice (since right join instead of left had acceptable performance, as it was claimed above), and in the...
Same here -- planning to take a closer look during tomorrow, the idea in general looks good though. Thank you @my-vegetable-has-exploded
@jsc0218 should I take any action regarding this PR now (e.g. rerun CI)?
> `range` are the row indices of the `batch` in the `BufferedBatch` which have the same join key. Not related to match or not. That matches my understanding of these...
@comphead I've finally got it -- it's like in this case SMJ is trying to produce output for each join key pair (streamed-buffered) -- I guess it's how smj state...
> Love this idea. I had been working on something [similar](https://github.com/datafusion-contrib/datafusion-tui) in the past but unfortunately life got in the way so wasnt able to push it as far as...
There are already aggregation variants of first/last which seem to solve the issue ([example](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/aggregate.slt#L170)), and, at first glance, they do not perform normal sorting, only compare incoming ordering column values...
More like https://github.com/apache/datafusion/blob/main/datafusion/functions-aggregate/src/first_last.rs (I suppose)