datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Bloom filter Join Step I: create benchmark

Open Lordworms opened this issue 1 year ago • 7 comments

Which issue does this PR close?

part of #7955

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Lordworms avatar Aug 11 '24 02:08 Lordworms

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

alamb avatar Aug 20 '24 17:08 alamb

for TPCH query 17, when we create 1000000 rows for lineitem and part table, the time spent on join is 50% (the other 80% of time spent on creating parquet files) Screenshot 2024-08-25 at 1 37 15 PM

Lordworms avatar Aug 25 '24 20:08 Lordworms

For the second case, 95% of the time spent on join image

Lordworms avatar Aug 25 '24 21:08 Lordworms

I think it worth a try to implement join predicate pushdown

Lordworms avatar Aug 25 '24 21:08 Lordworms

I suggest we revive this PR as it seems to have gotten lost / not reviewed 😢

alamb avatar Oct 18 '24 20:10 alamb

I suggest we revive this PR as it seems to have gotten lost / not reviewed 😢

I am still working on the implementation of actual join_pushdown, I'll push a complete PR once it is done

Lordworms avatar Oct 18 '24 23:10 Lordworms

I suggest we revive this PR as it seems to have gotten lost / not reviewed 😢

I am still working on the implementation of actual "hash_join build side statistic pushdown", I'll push a complete PR once it is done

Lordworms avatar Oct 18 '24 23:10 Lordworms

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Dec 18 '24 02:12 github-actions[bot]