Eduard Karacharov

Results 17 comments of Eduard Karacharov

Disregarding IEJoin -- `time` output from the issue description seems to show that both DuckDB and DF spend +- same cputime (user + system) and the only difference is parallelism...

> the old paralleism strategy should works, but the check in enforce_distribution.rs block the reparition I don't think it's proper way to go -- it'll give some benefits in terms...

My intention was to fix NLJoin parallelism issue due to fixed build-side choice (since right join instead of left had acceptable performance, as it was claimed above), and in the...

Same here -- planning to take a closer look during tomorrow, the idea in general looks good though. Thank you @my-vegetable-has-exploded

@jsc0218 should I take any action regarding this PR now (e.g. rerun CI)?

> `range` are the row indices of the `batch` in the `BufferedBatch` which have the same join key. Not related to match or not. That matches my understanding of these...

@comphead I've finally got it -- it's like in this case SMJ is trying to produce output for each join key pair (streamed-buffered) -- I guess it's how smj state...

> Love this idea. I had been working on something [similar](https://github.com/datafusion-contrib/datafusion-tui) in the past but unfortunately life got in the way so wasnt able to push it as far as...

There are already aggregation variants of first/last which seem to solve the issue ([example](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/aggregate.slt#L170)), and, at first glance, they do not perform normal sorting, only compare incoming ordering column values...

More like https://github.com/apache/datafusion/blob/main/datafusion/functions-aggregate/src/first_last.rs (I suppose)