datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

feat: Add projection to HashJoinExec.

Open my-vegetable-has-exploded opened this issue 1 year ago • 12 comments

Which issue does this PR close?

ref #6768

Rationale for this change

Some projection can't be pushed down left input or right input of hash join because filter or on need may need some columns that won't be used in later. By embed those projection to hash join, we can reduce the cost of build_batch_from_indices in hash join (build_batch_from_indices need to can compute::take() for each column) and avoid unecessary output creation.

What changes are included in this PR?

Add a rule try_embed_to_hash_join in physical_optimizer/projection_pushdown.rs. More related changes are are noted in the comments.

Are these changes tested?

Are there any user-facing changes?

None

Comparing main and hashjoin-project-pushdown
--------------------
Benchmark tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ hashjoin-project-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 466.19ms │                  467.21ms │     no change │
│ QQuery 2     │  83.28ms │                   78.72ms │ +1.06x faster │
│ QQuery 3     │ 194.39ms │                  189.47ms │     no change │
│ QQuery 4     │ 136.94ms │                  132.81ms │     no change │
│ QQuery 5     │ 342.11ms │                  337.57ms │     no change │
│ QQuery 6     │ 124.39ms │                  122.95ms │     no change │
│ QQuery 7     │ 489.19ms │                  475.26ms │     no change │
│ QQuery 8     │ 307.07ms │                  286.77ms │ +1.07x faster │
│ QQuery 9     │ 455.38ms │                  456.27ms │     no change │
│ QQuery 10    │ 354.06ms │                  370.70ms │     no change │
│ QQuery 11    │  90.61ms │                   84.50ms │ +1.07x faster │
│ QQuery 12    │ 168.44ms │                  164.79ms │     no change │
│ QQuery 13    │ 259.32ms │                  251.40ms │     no change │
│ QQuery 14    │ 166.62ms │                  162.89ms │     no change │
│ QQuery 15    │ 232.14ms │                  226.26ms │     no change │
│ QQuery 16    │  77.71ms │                   77.55ms │     no change │
│ QQuery 17    │ 497.30ms │                  497.10ms │     no change │
│ QQuery 18    │ 769.87ms │                  748.68ms │     no change │
│ QQuery 19    │ 267.55ms │                  274.82ms │     no change │
│ QQuery 20    │ 254.93ms │                  256.73ms │     no change │
│ QQuery 21    │ 552.90ms │                  547.46ms │     no change │
│ QQuery 22    │  75.80ms │                   74.67ms │     no change │
└──────────────┴──────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                        │ 6366.18ms │
│ Total Time (hashjoin-project-pushdown)   │ 6284.59ms │
│ Average Time (main)                      │  289.37ms │
│ Average Time (hashjoin-project-pushdown) │  285.66ms │
│ Queries Faster                           │         3 │
│ Queries Slower                           │         0 │
│ Queries with No Change                   │        19 │
└──────────────────────────────────────────┴───────────┘

The result in my pc is unstable, sometimes it get slower😅. This is last result that I get.

Comparing main and hashjoin-project-pushdown
--------------------
Benchmark tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ hashjoin-project-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 466.19ms │                  467.21ms │     no change │
│ QQuery 2     │  83.28ms │                   78.72ms │ +1.06x faster │
│ QQuery 3     │ 194.39ms │                  189.47ms │     no change │
│ QQuery 4     │ 136.94ms │                  132.81ms │     no change │
│ QQuery 5     │ 342.11ms │                  337.57ms │     no change │
│ QQuery 6     │ 124.39ms │                  122.95ms │     no change │
│ QQuery 7     │ 489.19ms │                  475.26ms │     no change │
│ QQuery 8     │ 307.07ms │                  286.77ms │ +1.07x faster │
│ QQuery 9     │ 455.38ms │                  456.27ms │     no change │
│ QQuery 10    │ 354.06ms │                  370.70ms │     no change │
│ QQuery 11    │  90.61ms │                   84.50ms │ +1.07x faster │
│ QQuery 12    │ 168.44ms │                  164.79ms │     no change │
│ QQuery 13    │ 259.32ms │                  251.40ms │     no change │
│ QQuery 14    │ 166.62ms │                  162.89ms │     no change │
│ QQuery 15    │ 232.14ms │                  226.26ms │     no change │
│ QQuery 16    │  77.71ms │                   77.55ms │     no change │
│ QQuery 17    │ 497.30ms │                  497.10ms │     no change │
│ QQuery 18    │ 769.87ms │                  748.68ms │     no change │
│ QQuery 19    │ 267.55ms │                  274.82ms │     no change │
│ QQuery 20    │ 254.93ms │                  256.73ms │     no change │
│ QQuery 21    │ 552.90ms │                  547.46ms │     no change │
│ QQuery 22    │  75.80ms │                   74.67ms │     no change │
└──────────────┴──────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                        │ 6366.18ms │
│ Total Time (hashjoin-project-pushdown)   │ 6284.59ms │
│ Average Time (main)                      │  289.37ms │
│ Average Time (hashjoin-project-pushdown) │  285.66ms │
│ Queries Faster                           │         3 │
│ Queries Slower                           │         0 │
│ Queries with No Change                   │        19 │
└──────────────────────────────────────────┴───────────┘

The result in my pc is unstable, sometimes it get slower😅. This is last result that I get.

Nice, could you run/post the tcph_mem results?

Dandandan avatar Feb 17 '24 03:02 Dandandan

These are my results for tcph_mem, seems to be a small but reasonable speed up 🚀 :

Comparing main and hashjoin-project-pushdown
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ hashjoin-project-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  84.89ms │                    79.70ms │ +1.07x faster │
│ QQuery 2     │  19.46ms │                    19.53ms │     no change │
│ QQuery 3     │  30.86ms │                    30.46ms │     no change │
│ QQuery 4     │  27.40ms │                    27.76ms │     no change │
│ QQuery 5     │  46.82ms │                    45.26ms │     no change │
│ QQuery 6     │   6.24ms │                     6.46ms │     no change │
│ QQuery 7     │ 107.98ms │                   101.54ms │ +1.06x faster │
│ QQuery 8     │  36.99ms │                    34.92ms │ +1.06x faster │
│ QQuery 9     │  51.31ms │                    50.74ms │     no change │
│ QQuery 10    │  61.42ms │                    58.83ms │     no change │
│ QQuery 11    │  14.90ms │                    14.45ms │     no change │
│ QQuery 12    │  30.32ms │                    30.39ms │     no change │
│ QQuery 13    │  30.31ms │                    30.53ms │     no change │
│ QQuery 14    │   8.85ms │                     8.84ms │     no change │
│ QQuery 15    │  23.29ms │                    22.15ms │     no change │
│ QQuery 16    │  19.47ms │                    20.17ms │     no change │
│ QQuery 17    │  50.95ms │                    50.01ms │     no change │
│ QQuery 18    │ 130.56ms │                   128.13ms │     no change │
│ QQuery 19    │  27.94ms │                    27.38ms │     no change │
│ QQuery 20    │  42.96ms │                    39.38ms │ +1.09x faster │
│ QQuery 21    │ 121.82ms │                   114.06ms │ +1.07x faster │
│ QQuery 22    │  13.53ms │                    13.47ms │     no change │
└──────────────┴──────────┴────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)                         │ 988.29ms │
│ Total Time (hashjoin-project-pushdown)   │ 954.15ms │
│ Average Time (main)                       │  44.92ms │
│ Average Time (hashjoin-project-pushdown) │  43.37ms │
│ Queries Faster                            │        5 │
│ Queries Slower                            │        0 │
│ Queries with No Change                    │       17 │
└───────────────────────────────────────────┴──────────┘

Dandandan avatar Feb 17 '24 15:02 Dandandan

These are my results for tcph_mem, seems to be a small but reasonable speed up 🚀 :

Comparing main and hashjoin-project-pushdown
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ hashjoin-project-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  84.89ms │                    79.70ms │ +1.07x faster │
│ QQuery 2     │  19.46ms │                    19.53ms │     no change │
│ QQuery 3     │  30.86ms │                    30.46ms │     no change │
│ QQuery 4     │  27.40ms │                    27.76ms │     no change │
│ QQuery 5     │  46.82ms │                    45.26ms │     no change │
│ QQuery 6     │   6.24ms │                     6.46ms │     no change │
│ QQuery 7     │ 107.98ms │                   101.54ms │ +1.06x faster │
│ QQuery 8     │  36.99ms │                    34.92ms │ +1.06x faster │
│ QQuery 9     │  51.31ms │                    50.74ms │     no change │
│ QQuery 10    │  61.42ms │                    58.83ms │     no change │
│ QQuery 11    │  14.90ms │                    14.45ms │     no change │
│ QQuery 12    │  30.32ms │                    30.39ms │     no change │
│ QQuery 13    │  30.31ms │                    30.53ms │     no change │
│ QQuery 14    │   8.85ms │                     8.84ms │     no change │
│ QQuery 15    │  23.29ms │                    22.15ms │     no change │
│ QQuery 16    │  19.47ms │                    20.17ms │     no change │
│ QQuery 17    │  50.95ms │                    50.01ms │     no change │
│ QQuery 18    │ 130.56ms │                   128.13ms │     no change │
│ QQuery 19    │  27.94ms │                    27.38ms │     no change │
│ QQuery 20    │  42.96ms │                    39.38ms │ +1.09x faster │
│ QQuery 21    │ 121.82ms │                   114.06ms │ +1.07x faster │
│ QQuery 22    │  13.53ms │                    13.47ms │     no change │
└──────────────┴──────────┴────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)                         │ 988.29ms │
│ Total Time (hashjoin-project-pushdown)   │ 954.15ms │
│ Average Time (main)                       │  44.92ms │
│ Average Time (hashjoin-project-pushdown) │  43.37ms │
│ Queries Faster                            │        5 │
│ Queries Slower                            │        0 │
│ Queries with No Change                    │       17 │
└───────────────────────────────────────────┴──────────┘

Thanks, @Dandandan. Currently, I don't project equivalence_properties and output_ordering. So some optimizer don't work after embed projection to HashJoinExec. I am trying to handle it.

These are my results for tcph_mem, seems to be a small but reasonable speed up 🚀 :

Comparing main and hashjoin-project-pushdown
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ hashjoin-project-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  84.89ms │                    79.70ms │ +1.07x faster │
│ QQuery 2     │  19.46ms │                    19.53ms │     no change │
│ QQuery 3     │  30.86ms │                    30.46ms │     no change │
│ QQuery 4     │  27.40ms │                    27.76ms │     no change │
│ QQuery 5     │  46.82ms │                    45.26ms │     no change │
│ QQuery 6     │   6.24ms │                     6.46ms │     no change │
│ QQuery 7     │ 107.98ms │                   101.54ms │ +1.06x faster │
│ QQuery 8     │  36.99ms │                    34.92ms │ +1.06x faster │
│ QQuery 9     │  51.31ms │                    50.74ms │     no change │
│ QQuery 10    │  61.42ms │                    58.83ms │     no change │
│ QQuery 11    │  14.90ms │                    14.45ms │     no change │
│ QQuery 12    │  30.32ms │                    30.39ms │     no change │
│ QQuery 13    │  30.31ms │                    30.53ms │     no change │
│ QQuery 14    │   8.85ms │                     8.84ms │     no change │
│ QQuery 15    │  23.29ms │                    22.15ms │     no change │
│ QQuery 16    │  19.47ms │                    20.17ms │     no change │
│ QQuery 17    │  50.95ms │                    50.01ms │     no change │
│ QQuery 18    │ 130.56ms │                   128.13ms │     no change │
│ QQuery 19    │  27.94ms │                    27.38ms │     no change │
│ QQuery 20    │  42.96ms │                    39.38ms │ +1.09x faster │
│ QQuery 21    │ 121.82ms │                   114.06ms │ +1.07x faster │
│ QQuery 22    │  13.53ms │                    13.47ms │     no change │
└──────────────┴──────────┴────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)                         │ 988.29ms │
│ Total Time (hashjoin-project-pushdown)   │ 954.15ms │
│ Average Time (main)                       │  44.92ms │
│ Average Time (hashjoin-project-pushdown) │  43.37ms │
│ Queries Faster                            │        5 │
│ Queries Slower                            │        0 │
│ Queries with No Change                    │       17 │
└───────────────────────────────────────────┴──────────┘

Thanks, @Dandandan. Currently, I don't project equivalence_properties and output_ordering. So some optimizer don't work after embed projection to HashJoinExec. I am trying to handle it.

Done! I will add more docs tomorrow.

@metesynnada PTAL

ozankabak avatar Feb 17 '24 21:02 ozankabak

I think this pr is ready for review. And I will add more detailed tests and check changes in sqllogicaltests more carefully these days. It will be great to post your benchmark result (On my machine, it tends to perform faster slightly most of the time, but somtimes it get slower😅)

If no one reviews, I will do it on Thursday, I am not available for 2 days.

metesynnada avatar Feb 20 '24 14:02 metesynnada

Change is looking good. As for performance, the difference on the TPC-H benchmarks is small as most of the time is spent filtering / repartitioning / hashing / ... I think to show it, it would make sense to create a benchmark for it. For example joining in memory datasets with a filter on multiple boolean columns might show a larger benefit.

Dandandan avatar Feb 21 '24 16:02 Dandandan

You should add numerical tests, existing tests do not cover the PR. Also, please increase the code quality in projection_pushdown, it is hard to understand. Beside these, LGTM.

metesynnada avatar Feb 23 '24 07:02 metesynnada

Thank you for review @Dandandan @metesynnada.

Also, please increase the code quality in projection_pushdown, it is hard to understand.

Sorry for it. I will add more comments and test for codes in this file. And could you give me more hints about how to improve the code quality?

No worries :)

For try_embed_to_hash_join

  1. Comments and Documentation: The initial comments are helpful, but they could be more detailed, especially in explaining the purpose and workings of the function. Include information about the function's parameters, return type, and a more detailed explanation of the logic within the function. Documenting each step within the function would also make it more readable and maintainable.
  2. Projection Length Check: The check projection_index.len() >= hash_join.schema().fields().len() is slightly unclear. A comment explaining the logic behind this check would be beneficial. For instance, if the intent is to verify that the projection doesn't have more fields than the hash join, explaining why this condition leads to an early return would be helpful.
  3. Improve Projection Removability Logic: The final decision to use new_hash_join or new_projection is based on whether the projection is removable. It would be helpful to add comments or documentation on how is_projection_removable works and under what conditions it considers a projection removable.

metesynnada avatar Feb 23 '24 14:02 metesynnada

Moving to draft as it is expected more actions before PR can be finally reviewed

comphead avatar Feb 29 '24 22:02 comphead

Sorry for late response. I added some tests and more comments.

2. **Projection Length Check**: The check **`projection_index.len() >= hash_join.schema().fields().len()`** is slightly unclear. A comment explaining the logic behind this check would be beneficial. For instance, if the intent is to verify that the projection doesn't have more fields than the hash join, explaining why this condition leads to an early return would be helpful.

This check follows other function in projection_pushdown. But it seems that we don't need it here.

Thanks!

If not urgent to merge, I can take a look later. Even it gets merged before I can take a look, I still can do a post-review.

viirya avatar Mar 05 '24 21:03 viirya

Same here -- planning to take a closer look during tomorrow, the idea in general looks good though.

Thank you @my-vegetable-has-exploded

korowa avatar Mar 06 '24 19:03 korowa

Thank you all for review. @Dandandan @metesynnada @korowa @viirya I think this PR is ready to go. @alamb

Thank you @my-vegetable-has-exploded !

Dandandan avatar Mar 10 '24 08:03 Dandandan