Reduce cloning in LogicalPlanBuilder
Which issue does this PR close?
- Related to #4628 but goes in the other direction, using
Arc<LogicalPlan>inLogicalPlanBuilder
Rationale for this change
Since currently we do have Arc<LogicalPlan> in the tree it's a bit weird that the plan builder forces us to unwrap and/or clone these plans in many cases and prevents node sharing when using the builder APIs.
What changes are included in this PR?
- Migrate function arguments from
LogicalPlantoimpl Into<Arc<LogicalPlan>> - ~Migrate function arguments from
&Schematoimpl Into<SchemaRef>~ - Add a new
LogicalPlanBuilder::build_arcfunction - Update usages (mostly in tests)
Are these changes tested?
Relying on existing tests, this is mostly a compile-time change.
Are there any user-facing changes?
Yes, a lot functions in LogicalPlanBuilder have changed signatures.
- functions that accepted an owned
LogicalPlanwill continue to work as-is but now also acceptArc<LogicalPlan> - ~functions that accepted
&Schemaand cloned internally will break user code, so this is up for debate~ - add a new
LogicalPlanBuilder::build_arcfunction
I will try and review this carefully over the next day or two
I think the changes to avoid Schema clones will help , though I just double checked and a Schema is mostly an empty HashMap and an Arc (Fields) so clone'ing isn't that expensive.
That's a good point, I didn't look into it. I'm open to reverting the changes in the logical plan builder from SchemaRef back to &Schema to reduce breaking changes but I wouldn't change logical2physical.
Also, I didn't find many places in the code by inspection where we saved a deep clone of a LogicalPlan -- did I miss any that you know of?
I don't think there's any deep cloning - after all the LogicalPlan children are also Arcs. But I also don't see much of a downside if we use impl Into<Arc<LogicalPlan>> since it allows using either one.
If we are going to change the APIs around and disrupt downstream users, I think they should get something from it, in this case better planning performance.
I guess that really depends on how much we use the logical plan builder internally.
I think the changes to avoid Schema clones will help , though I just double checked and a Schema is mostly an empty HashMap and an Arc (Fields) so clone'ing isn't that expensive.
That's a good point, I didn't look into it. I'm open to reverting the changes in the logical plan builder from
SchemaRefback to&Schemato reduce breaking changes but I wouldn't changelogical2physical.Also, I didn't find many places in the code by inspection where we saved a deep clone of a LogicalPlan -- did I miss any that you know of?
I don't think there's any deep cloning - after all the
LogicalPlanchildren are alsoArcs. But I also don't see much of a downside if we useimpl Into<Arc<LogicalPlan>>since it allows using either one.
I agree this change seems non disruptive for downstream users and is a good plan
If we are going to change the APIs around and disrupt downstream users, I think they should get something from it, in this case better planning performance.
I guess that really depends on how much we use the logical plan builder internally.
I think it is used frequently for planning and Dataframe APIs: https://github.com/apache/datafusion/blob/62e6d5e259d32f590b6d61bad6b08ece7b3416b9/datafusion/core/src/dataframe/mod.rs#L392-L391
Can we get a run of the planning benchmark compared to main to see if this has any affect on performance?
🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-builder (0d0a9b47f25239aa1a1a0d8e62dbede9d35b36e9) to 6cc73fa24d0d4d4006375e3bc20d6c267fc55c44 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-builder
Results will be posted here when complete
Can we get a run of the planning benchmark compared to main to see if this has any affect on performance?
I kicked off a run
🤖: Benchmark completed
Details
group arc-builder main
----- ----------- ----
logical_aggregate_with_join 1.00 627.4±3.98µs ? ?/sec 1.02 638.5±4.35µs ? ?/sec
logical_plan_optimize 1.00 181.6±20.94s ? ?/sec 1.06 192.6±22.94s ? ?/sec
logical_select_all_from_1000 1.01 11.2±0.05ms ? ?/sec 1.00 11.2±0.05ms ? ?/sec
logical_select_one_from_700 1.00 414.1±1.89µs ? ?/sec 1.01 419.6±2.34µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.00 371.3±3.92µs ? ?/sec 1.02 377.9±1.53µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.00 356.2±1.66µs ? ?/sec 1.02 364.0±1.53µs ? ?/sec
physical_intersection 1.00 832.3±8.24µs ? ?/sec 1.01 844.6±11.70µs ? ?/sec
physical_join_consider_sort 1.00 1388.1±15.29µs ? ?/sec 1.01 1408.7±6.02µs ? ?/sec
physical_join_distinct 1.00 347.3±2.00µs ? ?/sec 1.03 356.5±2.78µs ? ?/sec
physical_many_self_joins 1.00 9.5±0.04ms ? ?/sec 1.03 9.8±0.06ms ? ?/sec
physical_plan_clickbench_all 1.23 222.6±9.73ms ? ?/sec 1.00 181.2±4.77ms ? ?/sec
physical_plan_clickbench_q1 1.24 2.9±0.17ms ? ?/sec 1.00 2.4±0.02ms ? ?/sec
physical_plan_clickbench_q10 1.00 3.5±0.19ms ? ?/sec 1.00 3.5±0.11ms ? ?/sec
physical_plan_clickbench_q11 1.02 3.8±0.21ms ? ?/sec 1.00 3.7±0.11ms ? ?/sec
physical_plan_clickbench_q12 1.04 4.1±0.26ms ? ?/sec 1.00 4.0±0.18ms ? ?/sec
physical_plan_clickbench_q13 1.09 3.8±0.24ms ? ?/sec 1.00 3.5±0.10ms ? ?/sec
physical_plan_clickbench_q14 1.04 3.9±0.30ms ? ?/sec 1.00 3.7±0.10ms ? ?/sec
physical_plan_clickbench_q15 1.09 4.0±0.24ms ? ?/sec 1.00 3.7±0.11ms ? ?/sec
physical_plan_clickbench_q16 1.18 4.0±0.11ms ? ?/sec 1.00 3.4±0.09ms ? ?/sec
physical_plan_clickbench_q17 1.06 3.9±0.20ms ? ?/sec 1.00 3.7±0.17ms ? ?/sec
physical_plan_clickbench_q18 1.10 3.6±0.11ms ? ?/sec 1.00 3.2±0.10ms ? ?/sec
physical_plan_clickbench_q19 1.05 4.4±0.15ms ? ?/sec 1.00 4.2±0.10ms ? ?/sec
physical_plan_clickbench_q2 1.20 3.4±0.16ms ? ?/sec 1.00 2.9±0.04ms ? ?/sec
physical_plan_clickbench_q20 1.09 3.1±0.10ms ? ?/sec 1.00 2.8±0.09ms ? ?/sec
physical_plan_clickbench_q21 1.08 3.5±0.13ms ? ?/sec 1.00 3.3±0.11ms ? ?/sec
physical_plan_clickbench_q22 1.11 4.3±0.14ms ? ?/sec 1.00 3.9±0.14ms ? ?/sec
physical_plan_clickbench_q23 1.15 4.7±0.10ms ? ?/sec 1.00 4.0±0.14ms ? ?/sec
physical_plan_clickbench_q24 1.09 5.1±0.17ms ? ?/sec 1.00 4.7±0.14ms ? ?/sec
physical_plan_clickbench_q25 1.14 3.8±0.10ms ? ?/sec 1.00 3.3±0.10ms ? ?/sec
physical_plan_clickbench_q26 1.08 3.3±0.17ms ? ?/sec 1.00 3.1±0.09ms ? ?/sec
physical_plan_clickbench_q27 1.14 3.8±0.07ms ? ?/sec 1.00 3.4±0.13ms ? ?/sec
physical_plan_clickbench_q28 1.03 4.5±0.16ms ? ?/sec 1.00 4.4±0.12ms ? ?/sec
physical_plan_clickbench_q29 1.12 5.1±0.08ms ? ?/sec 1.00 4.6±0.15ms ? ?/sec
physical_plan_clickbench_q3 1.23 3.3±0.17ms ? ?/sec 1.00 2.7±0.02ms ? ?/sec
physical_plan_clickbench_q30 1.01 13.6±0.25ms ? ?/sec 1.00 13.5±0.11ms ? ?/sec
physical_plan_clickbench_q31 1.10 4.7±0.14ms ? ?/sec 1.00 4.3±0.16ms ? ?/sec
physical_plan_clickbench_q32 1.13 4.6±0.26ms ? ?/sec 1.00 4.1±0.11ms ? ?/sec
physical_plan_clickbench_q33 1.13 4.1±0.16ms ? ?/sec 1.00 3.6±0.16ms ? ?/sec
physical_plan_clickbench_q34 1.18 3.7±0.09ms ? ?/sec 1.00 3.2±0.08ms ? ?/sec
physical_plan_clickbench_q35 1.14 3.7±0.15ms ? ?/sec 1.00 3.2±0.10ms ? ?/sec
physical_plan_clickbench_q36 1.16 4.6±0.08ms ? ?/sec 1.00 4.0±0.10ms ? ?/sec
physical_plan_clickbench_q37 1.11 4.6±0.16ms ? ?/sec 1.00 4.2±0.10ms ? ?/sec
physical_plan_clickbench_q38 1.17 4.9±0.07ms ? ?/sec 1.00 4.2±0.11ms ? ?/sec
physical_plan_clickbench_q39 1.09 4.4±0.26ms ? ?/sec 1.00 4.0±0.11ms ? ?/sec
physical_plan_clickbench_q4 1.14 2.8±0.12ms ? ?/sec 1.00 2.5±0.06ms ? ?/sec
physical_plan_clickbench_q40 1.28 5.6±0.27ms ? ?/sec 1.00 4.4±0.05ms ? ?/sec
physical_plan_clickbench_q41 1.17 4.7±0.27ms ? ?/sec 1.00 4.0±0.07ms ? ?/sec
physical_plan_clickbench_q42 1.28 5.0±0.08ms ? ?/sec 1.00 3.9±0.05ms ? ?/sec
physical_plan_clickbench_q43 1.20 5.1±0.23ms ? ?/sec 1.00 4.3±0.11ms ? ?/sec
physical_plan_clickbench_q44 1.33 3.4±0.11ms ? ?/sec 1.00 2.6±0.02ms ? ?/sec
physical_plan_clickbench_q45 1.23 3.2±0.18ms ? ?/sec 1.00 2.6±0.01ms ? ?/sec
physical_plan_clickbench_q46 1.26 3.8±0.19ms ? ?/sec 1.00 3.0±0.02ms ? ?/sec
physical_plan_clickbench_q47 1.27 4.5±0.08ms ? ?/sec 1.00 3.5±0.03ms ? ?/sec
physical_plan_clickbench_q48 1.21 5.2±0.18ms ? ?/sec 1.00 4.3±0.02ms ? ?/sec
physical_plan_clickbench_q49 1.24 5.6±0.10ms ? ?/sec 1.00 4.6±0.03ms ? ?/sec
physical_plan_clickbench_q5 1.08 3.1±0.15ms ? ?/sec 1.00 2.9±0.07ms ? ?/sec
physical_plan_clickbench_q50 1.23 5.1±0.22ms ? ?/sec 1.00 4.1±0.03ms ? ?/sec
physical_plan_clickbench_q51 1.30 4.0±0.09ms ? ?/sec 1.00 3.1±0.05ms ? ?/sec
physical_plan_clickbench_q6 1.10 3.2±0.16ms ? ?/sec 1.00 2.9±0.07ms ? ?/sec
physical_plan_clickbench_q7 1.01 2.6±0.13ms ? ?/sec 1.00 2.5±0.06ms ? ?/sec
physical_plan_clickbench_q8 1.02 3.6±0.20ms ? ?/sec 1.00 3.5±0.10ms ? ?/sec
physical_plan_clickbench_q9 1.06 3.6±0.21ms ? ?/sec 1.00 3.4±0.09ms ? ?/sec
physical_plan_tpcds_all 1.08 1117.7±11.76ms ? ?/sec 1.00 1030.2±2.76ms ? ?/sec
physical_plan_tpch_all 1.12 72.7±2.28ms ? ?/sec 1.00 65.1±0.52ms ? ?/sec
physical_plan_tpch_q1 1.04 2.2±0.09ms ? ?/sec 1.00 2.1±0.02ms ? ?/sec
physical_plan_tpch_q10 1.09 4.5±0.07ms ? ?/sec 1.00 4.2±0.05ms ? ?/sec
physical_plan_tpch_q11 1.05 3.9±0.14ms ? ?/sec 1.00 3.7±0.07ms ? ?/sec
physical_plan_tpch_q12 1.07 2.0±0.09ms ? ?/sec 1.00 1878.3±19.65µs ? ?/sec
physical_plan_tpch_q13 1.00 1519.5±22.32µs ? ?/sec 1.00 1515.9±22.07µs ? ?/sec
physical_plan_tpch_q14 1.13 2.3±0.03ms ? ?/sec 1.00 2.0±0.02ms ? ?/sec
physical_plan_tpch_q16 1.02 2.7±0.10ms ? ?/sec 1.00 2.6±0.06ms ? ?/sec
physical_plan_tpch_q17 1.08 3.0±0.07ms ? ?/sec 1.00 2.8±0.08ms ? ?/sec
physical_plan_tpch_q18 1.06 2.9±0.11ms ? ?/sec 1.00 2.7±0.03ms ? ?/sec
physical_plan_tpch_q19 1.11 3.6±0.11ms ? ?/sec 1.00 3.3±0.01ms ? ?/sec
physical_plan_tpch_q2 1.06 6.7±0.04ms ? ?/sec 1.00 6.3±0.08ms ? ?/sec
physical_plan_tpch_q20 1.12 3.6±0.11ms ? ?/sec 1.00 3.2±0.02ms ? ?/sec
physical_plan_tpch_q21 1.11 4.8±0.10ms ? ?/sec 1.00 4.3±0.01ms ? ?/sec
physical_plan_tpch_q22 1.14 3.2±0.06ms ? ?/sec 1.00 2.8±0.01ms ? ?/sec
physical_plan_tpch_q3 1.01 2.8±0.09ms ? ?/sec 1.00 2.8±0.06ms ? ?/sec
physical_plan_tpch_q4 1.04 1697.1±36.89µs ? ?/sec 1.00 1626.2±22.99µs ? ?/sec
physical_plan_tpch_q5 1.00 3.4±0.12ms ? ?/sec 1.01 3.5±0.05ms ? ?/sec
physical_plan_tpch_q6 1.02 903.5±12.86µs ? ?/sec 1.00 887.2±6.96µs ? ?/sec
physical_plan_tpch_q7 1.00 4.5±0.20ms ? ?/sec 1.00 4.5±0.08ms ? ?/sec
physical_plan_tpch_q8 1.07 6.3±0.04ms ? ?/sec 1.00 5.9±0.09ms ? ?/sec
physical_plan_tpch_q9 1.00 4.3±0.16ms ? ?/sec 1.00 4.3±0.07ms ? ?/sec
physical_select_aggregates_from_200 1.00 16.9±0.07ms ? ?/sec 1.01 17.0±0.07ms ? ?/sec
physical_select_all_from_1000 1.00 24.3±0.10ms ? ?/sec 1.00 24.3±0.08ms ? ?/sec
physical_select_one_from_700 1.00 1073.7±9.83µs ? ?/sec 1.02 1091.6±7.92µs ? ?/sec
physical_sorted_union_order_by_10 1.00 13.0±0.04ms ? ?/sec 1.09 14.1±0.24ms ? ?/sec
physical_sorted_union_order_by_100 1.00 2.0±0.02s ? ?/sec 1.04 2.1±0.17s ? ?/sec
physical_sorted_union_order_by_200 1.01 12.2±0.15s ? ?/sec 1.00 12.0±0.43s ? ?/sec
physical_sorted_union_order_by_300 1.00 38.3±0.71s ? ?/sec 1.00 38.2±0.60s ? ?/sec
physical_sorted_union_order_by_50 1.00 377.0±2.07ms ? ?/sec 1.08 405.7±22.59ms ? ?/sec
physical_theta_join_consider_sort 1.00 1756.0±17.25µs ? ?/sec 1.02 1784.8±18.46µs ? ?/sec
physical_unnest_to_join 1.00 1858.8±24.25µs ? ?/sec 1.01 1879.0±10.19µs ? ?/sec
physical_window_function_partition_by_12_on_values 1.00 1092.1±3.28µs ? ?/sec 1.00 1095.9±3.65µs ? ?/sec
physical_window_function_partition_by_30_on_values 1.00 2.2±0.01ms ? ?/sec 1.00 2.3±0.02ms ? ?/sec
physical_window_function_partition_by_4_on_values 1.00 661.7±6.88µs ? ?/sec 1.00 659.9±5.84µs ? ?/sec
physical_window_function_partition_by_7_on_values 1.00 812.7±6.07µs ? ?/sec 1.00 812.8±4.78µs ? ?/sec
physical_window_function_partition_by_8_on_values 1.00 871.6±3.84µs ? ?/sec 1.01 876.2±3.94µs ? ?/sec
with_param_values_many_columns 1.02 129.1±1.28µs ? ?/sec 1.00 127.2±0.98µs ? ?/sec
Seems like this PR is resulting in a perf slowdown
Seems like this PR is resulting in a perf slowdown
Strange, no idea why that would be the case
Seems like this PR is resulting in a perf slowdown
Strange, no idea why that would be the case
I will rerun the benchmarks to see if it reproduceable
🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-builder (0d0a9b47f25239aa1a1a0d8e62dbede9d35b36e9) to 6cc73fa24d0d4d4006375e3bc20d6c267fc55c44 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-builder
Results will be posted here when complete
🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-builder (0d0a9b47f25239aa1a1a0d8e62dbede9d35b36e9) to 6cc73fa24d0d4d4006375e3bc20d6c267fc55c44 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-builder
Results will be posted here when complete
🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arc-builder (0d0a9b47f25239aa1a1a0d8e62dbede9d35b36e9) to 6cc73fa24d0d4d4006375e3bc20d6c267fc55c44 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=arc-builder
Results will be posted here when complete