datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Reduce cloning in LogicalPlanBuilder

Open joroKr21 opened this issue 5 months ago • 13 comments

Which issue does this PR close?

  • Related to #4628 but goes in the other direction, using Arc<LogicalPlan> in LogicalPlanBuilder

Rationale for this change

Since currently we do have Arc<LogicalPlan> in the tree it's a bit weird that the plan builder forces us to unwrap and/or clone these plans in many cases and prevents node sharing when using the builder APIs.

What changes are included in this PR?

  • Migrate function arguments from LogicalPlan to impl Into<Arc<LogicalPlan>>
  • ~Migrate function arguments from &Schema to impl Into<SchemaRef>~
  • Add a new LogicalPlanBuilder::build_arc function
  • Update usages (mostly in tests)

Are these changes tested?

Relying on existing tests, this is mostly a compile-time change.

Are there any user-facing changes?

Yes, a lot functions in LogicalPlanBuilder have changed signatures.

  • functions that accepted an owned LogicalPlan will continue to work as-is but now also accept Arc<LogicalPlan>
  • ~functions that accepted &Schema and cloned internally will break user code, so this is up for debate~
  • add a new LogicalPlanBuilder::build_arc function

joroKr21 avatar Sep 19 '25 17:09 joroKr21

I will try and review this carefully over the next day or two

alamb avatar Sep 26 '25 16:09 alamb

I think the changes to avoid Schema clones will help , though I just double checked and a Schema is mostly an empty HashMap and an Arc (Fields) so clone'ing isn't that expensive.

That's a good point, I didn't look into it. I'm open to reverting the changes in the logical plan builder from SchemaRef back to &Schema to reduce breaking changes but I wouldn't change logical2physical.

Also, I didn't find many places in the code by inspection where we saved a deep clone of a LogicalPlan -- did I miss any that you know of?

I don't think there's any deep cloning - after all the LogicalPlan children are also Arcs. But I also don't see much of a downside if we use impl Into<Arc<LogicalPlan>> since it allows using either one.

If we are going to change the APIs around and disrupt downstream users, I think they should get something from it, in this case better planning performance.

I guess that really depends on how much we use the logical plan builder internally.

joroKr21 avatar Sep 28 '25 09:09 joroKr21

I think the changes to avoid Schema clones will help , though I just double checked and a Schema is mostly an empty HashMap and an Arc (Fields) so clone'ing isn't that expensive.

That's a good point, I didn't look into it. I'm open to reverting the changes in the logical plan builder from SchemaRef back to &Schema to reduce breaking changes but I wouldn't change logical2physical.

Also, I didn't find many places in the code by inspection where we saved a deep clone of a LogicalPlan -- did I miss any that you know of?

I don't think there's any deep cloning - after all the LogicalPlan children are also Arcs. But I also don't see much of a downside if we use impl Into<Arc<LogicalPlan>> since it allows using either one.

I agree this change seems non disruptive for downstream users and is a good plan

If we are going to change the APIs around and disrupt downstream users, I think they should get something from it, in this case better planning performance.

I guess that really depends on how much we use the logical plan builder internally.

I think it is used frequently for planning and Dataframe APIs: https://github.com/apache/datafusion/blob/62e6d5e259d32f590b6d61bad6b08ece7b3416b9/datafusion/core/src/dataframe/mod.rs#L392-L391

alamb avatar Sep 28 '25 12:09 alamb

Can we get a run of the planning benchmark compared to main to see if this has any affect on performance?

Omega359 avatar Oct 28 '25 19:10 Omega359

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing arc-builder (0d0a9b47f25239aa1a1a0d8e62dbede9d35b36e9) to 6cc73fa24d0d4d4006375e3bc20d6c267fc55c44 diff BENCH_NAME=sql_planner BENCH_COMMAND=cargo bench --bench sql_planner BENCH_FILTER= BENCH_BRANCH_NAME=arc-builder Results will be posted here when complete

alamb avatar Oct 29 '25 16:10 alamb

Can we get a run of the planning benchmark compared to main to see if this has any affect on performance?

I kicked off a run

alamb avatar Oct 29 '25 16:10 alamb

🤖: Benchmark completed

Details

group                                                 arc-builder                            main
-----                                                 -----------                            ----
logical_aggregate_with_join                           1.00    627.4±3.98µs        ? ?/sec    1.02    638.5±4.35µs        ? ?/sec
logical_plan_optimize                                 1.00    181.6±20.94s        ? ?/sec    1.06    192.6±22.94s        ? ?/sec
logical_select_all_from_1000                          1.01     11.2±0.05ms        ? ?/sec    1.00     11.2±0.05ms        ? ?/sec
logical_select_one_from_700                           1.00    414.1±1.89µs        ? ?/sec    1.01    419.6±2.34µs        ? ?/sec
logical_trivial_join_high_numbered_columns            1.00    371.3±3.92µs        ? ?/sec    1.02    377.9±1.53µs        ? ?/sec
logical_trivial_join_low_numbered_columns             1.00    356.2±1.66µs        ? ?/sec    1.02    364.0±1.53µs        ? ?/sec
physical_intersection                                 1.00    832.3±8.24µs        ? ?/sec    1.01   844.6±11.70µs        ? ?/sec
physical_join_consider_sort                           1.00  1388.1±15.29µs        ? ?/sec    1.01   1408.7±6.02µs        ? ?/sec
physical_join_distinct                                1.00    347.3±2.00µs        ? ?/sec    1.03    356.5±2.78µs        ? ?/sec
physical_many_self_joins                              1.00      9.5±0.04ms        ? ?/sec    1.03      9.8±0.06ms        ? ?/sec
physical_plan_clickbench_all                          1.23    222.6±9.73ms        ? ?/sec    1.00    181.2±4.77ms        ? ?/sec
physical_plan_clickbench_q1                           1.24      2.9±0.17ms        ? ?/sec    1.00      2.4±0.02ms        ? ?/sec
physical_plan_clickbench_q10                          1.00      3.5±0.19ms        ? ?/sec    1.00      3.5±0.11ms        ? ?/sec
physical_plan_clickbench_q11                          1.02      3.8±0.21ms        ? ?/sec    1.00      3.7±0.11ms        ? ?/sec
physical_plan_clickbench_q12                          1.04      4.1±0.26ms        ? ?/sec    1.00      4.0±0.18ms        ? ?/sec
physical_plan_clickbench_q13                          1.09      3.8±0.24ms        ? ?/sec    1.00      3.5±0.10ms        ? ?/sec
physical_plan_clickbench_q14                          1.04      3.9±0.30ms        ? ?/sec    1.00      3.7±0.10ms        ? ?/sec
physical_plan_clickbench_q15                          1.09      4.0±0.24ms        ? ?/sec    1.00      3.7±0.11ms        ? ?/sec
physical_plan_clickbench_q16                          1.18      4.0±0.11ms        ? ?/sec    1.00      3.4±0.09ms        ? ?/sec
physical_plan_clickbench_q17                          1.06      3.9±0.20ms        ? ?/sec    1.00      3.7±0.17ms        ? ?/sec
physical_plan_clickbench_q18                          1.10      3.6±0.11ms        ? ?/sec    1.00      3.2±0.10ms        ? ?/sec
physical_plan_clickbench_q19                          1.05      4.4±0.15ms        ? ?/sec    1.00      4.2±0.10ms        ? ?/sec
physical_plan_clickbench_q2                           1.20      3.4±0.16ms        ? ?/sec    1.00      2.9±0.04ms        ? ?/sec
physical_plan_clickbench_q20                          1.09      3.1±0.10ms        ? ?/sec    1.00      2.8±0.09ms        ? ?/sec
physical_plan_clickbench_q21                          1.08      3.5±0.13ms        ? ?/sec    1.00      3.3±0.11ms        ? ?/sec
physical_plan_clickbench_q22                          1.11      4.3±0.14ms        ? ?/sec    1.00      3.9±0.14ms        ? ?/sec
physical_plan_clickbench_q23                          1.15      4.7±0.10ms        ? ?/sec    1.00      4.0±0.14ms        ? ?/sec
physical_plan_clickbench_q24                          1.09      5.1±0.17ms        ? ?/sec    1.00      4.7±0.14ms        ? ?/sec
physical_plan_clickbench_q25                          1.14      3.8±0.10ms        ? ?/sec    1.00      3.3±0.10ms        ? ?/sec
physical_plan_clickbench_q26                          1.08      3.3±0.17ms        ? ?/sec    1.00      3.1±0.09ms        ? ?/sec
physical_plan_clickbench_q27                          1.14      3.8±0.07ms        ? ?/sec    1.00      3.4±0.13ms        ? ?/sec
physical_plan_clickbench_q28                          1.03      4.5±0.16ms        ? ?/sec    1.00      4.4±0.12ms        ? ?/sec
physical_plan_clickbench_q29                          1.12      5.1±0.08ms        ? ?/sec    1.00      4.6±0.15ms        ? ?/sec
physical_plan_clickbench_q3                           1.23      3.3±0.17ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
physical_plan_clickbench_q30                          1.01     13.6±0.25ms        ? ?/sec    1.00     13.5±0.11ms        ? ?/sec
physical_plan_clickbench_q31                          1.10      4.7±0.14ms        ? ?/sec    1.00      4.3±0.16ms        ? ?/sec
physical_plan_clickbench_q32                          1.13      4.6±0.26ms        ? ?/sec    1.00      4.1±0.11ms        ? ?/sec
physical_plan_clickbench_q33                          1.13      4.1±0.16ms        ? ?/sec    1.00      3.6±0.16ms        ? ?/sec
physical_plan_clickbench_q34                          1.18      3.7±0.09ms        ? ?/sec    1.00      3.2±0.08ms        ? ?/sec
physical_plan_clickbench_q35                          1.14      3.7±0.15ms        ? ?/sec    1.00      3.2±0.10ms        ? ?/sec
physical_plan_clickbench_q36                          1.16      4.6±0.08ms        ? ?/sec    1.00      4.0±0.10ms        ? ?/sec
physical_plan_clickbench_q37                          1.11      4.6±0.16ms        ? ?/sec    1.00      4.2±0.10ms        ? ?/sec
physical_plan_clickbench_q38                          1.17      4.9±0.07ms        ? ?/sec    1.00      4.2±0.11ms        ? ?/sec
physical_plan_clickbench_q39                          1.09      4.4±0.26ms        ? ?/sec    1.00      4.0±0.11ms        ? ?/sec
physical_plan_clickbench_q4                           1.14      2.8±0.12ms        ? ?/sec    1.00      2.5±0.06ms        ? ?/sec
physical_plan_clickbench_q40                          1.28      5.6±0.27ms        ? ?/sec    1.00      4.4±0.05ms        ? ?/sec
physical_plan_clickbench_q41                          1.17      4.7±0.27ms        ? ?/sec    1.00      4.0±0.07ms        ? ?/sec
physical_plan_clickbench_q42                          1.28      5.0±0.08ms        ? ?/sec    1.00      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q43                          1.20      5.1±0.23ms        ? ?/sec    1.00      4.3±0.11ms        ? ?/sec
physical_plan_clickbench_q44                          1.33      3.4±0.11ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
physical_plan_clickbench_q45                          1.23      3.2±0.18ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q46                          1.26      3.8±0.19ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_plan_clickbench_q47                          1.27      4.5±0.08ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec
physical_plan_clickbench_q48                          1.21      5.2±0.18ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec
physical_plan_clickbench_q49                          1.24      5.6±0.10ms        ? ?/sec    1.00      4.6±0.03ms        ? ?/sec
physical_plan_clickbench_q5                           1.08      3.1±0.15ms        ? ?/sec    1.00      2.9±0.07ms        ? ?/sec
physical_plan_clickbench_q50                          1.23      5.1±0.22ms        ? ?/sec    1.00      4.1±0.03ms        ? ?/sec
physical_plan_clickbench_q51                          1.30      4.0±0.09ms        ? ?/sec    1.00      3.1±0.05ms        ? ?/sec
physical_plan_clickbench_q6                           1.10      3.2±0.16ms        ? ?/sec    1.00      2.9±0.07ms        ? ?/sec
physical_plan_clickbench_q7                           1.01      2.6±0.13ms        ? ?/sec    1.00      2.5±0.06ms        ? ?/sec
physical_plan_clickbench_q8                           1.02      3.6±0.20ms        ? ?/sec    1.00      3.5±0.10ms        ? ?/sec
physical_plan_clickbench_q9                           1.06      3.6±0.21ms        ? ?/sec    1.00      3.4±0.09ms        ? ?/sec
physical_plan_tpcds_all                               1.08  1117.7±11.76ms        ? ?/sec    1.00   1030.2±2.76ms        ? ?/sec
physical_plan_tpch_all                                1.12     72.7±2.28ms        ? ?/sec    1.00     65.1±0.52ms        ? ?/sec
physical_plan_tpch_q1                                 1.04      2.2±0.09ms        ? ?/sec    1.00      2.1±0.02ms        ? ?/sec
physical_plan_tpch_q10                                1.09      4.5±0.07ms        ? ?/sec    1.00      4.2±0.05ms        ? ?/sec
physical_plan_tpch_q11                                1.05      3.9±0.14ms        ? ?/sec    1.00      3.7±0.07ms        ? ?/sec
physical_plan_tpch_q12                                1.07      2.0±0.09ms        ? ?/sec    1.00  1878.3±19.65µs        ? ?/sec
physical_plan_tpch_q13                                1.00  1519.5±22.32µs        ? ?/sec    1.00  1515.9±22.07µs        ? ?/sec
physical_plan_tpch_q14                                1.13      2.3±0.03ms        ? ?/sec    1.00      2.0±0.02ms        ? ?/sec
physical_plan_tpch_q16                                1.02      2.7±0.10ms        ? ?/sec    1.00      2.6±0.06ms        ? ?/sec
physical_plan_tpch_q17                                1.08      3.0±0.07ms        ? ?/sec    1.00      2.8±0.08ms        ? ?/sec
physical_plan_tpch_q18                                1.06      2.9±0.11ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
physical_plan_tpch_q19                                1.11      3.6±0.11ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
physical_plan_tpch_q2                                 1.06      6.7±0.04ms        ? ?/sec    1.00      6.3±0.08ms        ? ?/sec
physical_plan_tpch_q20                                1.12      3.6±0.11ms        ? ?/sec    1.00      3.2±0.02ms        ? ?/sec
physical_plan_tpch_q21                                1.11      4.8±0.10ms        ? ?/sec    1.00      4.3±0.01ms        ? ?/sec
physical_plan_tpch_q22                                1.14      3.2±0.06ms        ? ?/sec    1.00      2.8±0.01ms        ? ?/sec
physical_plan_tpch_q3                                 1.01      2.8±0.09ms        ? ?/sec    1.00      2.8±0.06ms        ? ?/sec
physical_plan_tpch_q4                                 1.04  1697.1±36.89µs        ? ?/sec    1.00  1626.2±22.99µs        ? ?/sec
physical_plan_tpch_q5                                 1.00      3.4±0.12ms        ? ?/sec    1.01      3.5±0.05ms        ? ?/sec
physical_plan_tpch_q6                                 1.02   903.5±12.86µs        ? ?/sec    1.00    887.2±6.96µs        ? ?/sec
physical_plan_tpch_q7                                 1.00      4.5±0.20ms        ? ?/sec    1.00      4.5±0.08ms        ? ?/sec
physical_plan_tpch_q8                                 1.07      6.3±0.04ms        ? ?/sec    1.00      5.9±0.09ms        ? ?/sec
physical_plan_tpch_q9                                 1.00      4.3±0.16ms        ? ?/sec    1.00      4.3±0.07ms        ? ?/sec
physical_select_aggregates_from_200                   1.00     16.9±0.07ms        ? ?/sec    1.01     17.0±0.07ms        ? ?/sec
physical_select_all_from_1000                         1.00     24.3±0.10ms        ? ?/sec    1.00     24.3±0.08ms        ? ?/sec
physical_select_one_from_700                          1.00   1073.7±9.83µs        ? ?/sec    1.02   1091.6±7.92µs        ? ?/sec
physical_sorted_union_order_by_10                     1.00     13.0±0.04ms        ? ?/sec    1.09     14.1±0.24ms        ? ?/sec
physical_sorted_union_order_by_100                    1.00       2.0±0.02s        ? ?/sec    1.04       2.1±0.17s        ? ?/sec
physical_sorted_union_order_by_200                    1.01      12.2±0.15s        ? ?/sec    1.00      12.0±0.43s        ? ?/sec
physical_sorted_union_order_by_300                    1.00      38.3±0.71s        ? ?/sec    1.00      38.2±0.60s        ? ?/sec
physical_sorted_union_order_by_50                     1.00    377.0±2.07ms        ? ?/sec    1.08   405.7±22.59ms        ? ?/sec
physical_theta_join_consider_sort                     1.00  1756.0±17.25µs        ? ?/sec    1.02  1784.8±18.46µs        ? ?/sec
physical_unnest_to_join                               1.00  1858.8±24.25µs        ? ?/sec    1.01  1879.0±10.19µs        ? ?/sec
physical_window_function_partition_by_12_on_values    1.00   1092.1±3.28µs        ? ?/sec    1.00   1095.9±3.65µs        ? ?/sec
physical_window_function_partition_by_30_on_values    1.00      2.2±0.01ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
physical_window_function_partition_by_4_on_values     1.00    661.7±6.88µs        ? ?/sec    1.00    659.9±5.84µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00    812.7±6.07µs        ? ?/sec    1.00    812.8±4.78µs        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00    871.6±3.84µs        ? ?/sec    1.01    876.2±3.94µs        ? ?/sec
with_param_values_many_columns                        1.02    129.1±1.28µs        ? ?/sec    1.00    127.2±0.98µs        ? ?/sec

alamb avatar Oct 30 '25 07:10 alamb

Seems like this PR is resulting in a perf slowdown

Omega359 avatar Oct 30 '25 18:10 Omega359

Seems like this PR is resulting in a perf slowdown

Strange, no idea why that would be the case

joroKr21 avatar Oct 30 '25 21:10 joroKr21

Seems like this PR is resulting in a perf slowdown

Strange, no idea why that would be the case

I will rerun the benchmarks to see if it reproduceable

alamb avatar Oct 31 '25 13:10 alamb

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing arc-builder (0d0a9b47f25239aa1a1a0d8e62dbede9d35b36e9) to 6cc73fa24d0d4d4006375e3bc20d6c267fc55c44 diff BENCH_NAME=sql_planner BENCH_COMMAND=cargo bench --bench sql_planner BENCH_FILTER= BENCH_BRANCH_NAME=arc-builder Results will be posted here when complete

alamb avatar Oct 31 '25 15:10 alamb

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing arc-builder (0d0a9b47f25239aa1a1a0d8e62dbede9d35b36e9) to 6cc73fa24d0d4d4006375e3bc20d6c267fc55c44 diff BENCH_NAME=sql_planner BENCH_COMMAND=cargo bench --bench sql_planner BENCH_FILTER= BENCH_BRANCH_NAME=arc-builder Results will be posted here when complete

alamb avatar Oct 31 '25 18:10 alamb

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing arc-builder (0d0a9b47f25239aa1a1a0d8e62dbede9d35b36e9) to 6cc73fa24d0d4d4006375e3bc20d6c267fc55c44 diff BENCH_NAME=sql_planner BENCH_COMMAND=cargo bench --bench sql_planner BENCH_FILTER= BENCH_BRANCH_NAME=arc-builder Results will be posted here when complete

alamb avatar Oct 31 '25 18:10 alamb