Add rule to filter for null values at inner-joins
Assume the following query:
SELECT * FROM t1 INNER JOIN t2 ON t1.a = t2.b
The t2.b column does not allow null values but t1.a does. Rows in t1 where a is null can never match any row in t2.b for the inner join case with an equi-join condition t1.a = t2.b. Therefore, we can add a filter when reading t1 to filter out rows with null values before we perform the join to reduce the number of rows.
Inner-Join(t1.a = t2.b)
/ \
t1.a t2.b
becomes:
Inner-Join(t1.a = t2.b)
/ \
Filter(a is not null) \
/ \
t1.a t2.b
Reference:
https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/filter_null_join_keys.rs
This would be a great optimization.
Please note that even if the fields on both sides would allow nulls the join would not happen anyway because NULL does not equal NULL.
It would then also be perfect if this were also factored in memory accounting and in costs calculations for plan optimization considering null_frac from pg_stats
Hi
I am Chitrank, and I am currently pursuing MS in CS at UT Austin.
As part of my course project, I have to work on addressing a specific issue or bug within a defined timeframe, with the goal of creating a Pull Request (PR) to contribute to the project's development. Given the significance of this project, I am seeking assistance and guidance from experienced contributors such as yourself.
I have identified a set of issues that align with my project's objectives, and I believe that your expertise and insights would be immensely valuable in helping me understand and resolve these challenges effectively.
Kind regards,
Chitrank
Hi @theartpiece, thank you for the interest in contributing to CrateDB. Please don't post your interest on a bunch of issues, but rather pick one that interests you, so that we can assign it to you and assist you. I can suggest for example a simple one like: https://github.com/crate/crate/issues/14714 and maybe you can take over something else after that.
Thank you!
@theartpiece An excellent first task to get started is https://github.com/crate/crate/issues/2036
@matriv, @mkleen This will be part of a course project where we have to pick 3 issues from a repo and solve them before the end of the semester, and therefore we had selected 3 issues (1 hard, and 2 easy) to get started. We can get started with #2036 or #14714 and we will hopefully get assigned this issue after completing them. Thanks for the quick response.
Hi @mkleen I'm following up on @decoder746 's message above. We were wondering if it's possible to meet developers for a small meeting (we will keep it under 30 min) so that we can unblock ourselves and also get your feedback.
Kind regards
@theartpiece We have a public channel where you can chat directly with the team. We can continue from there. https://app.gitter.im/#/room/#crate_crate:gitter.im
@theartpiece and we can also do a meeting to get started. You can contact me at [email protected]
@mkleen Thanks so much for your constant help and support. My plan is to first to clone and install the repository. I'll contact you real soon as soon as I'm done with this step.