cloudberry
cloudberry copied to clipboard
[enhancement] Direct dispatch with parallel.
Cloudberry Database version
No response
What happened
Direct dispatch in parallel mode may have no improvement compared to a non-parallel direct dispatch. non-parallel direct dispatch:
gpadmin=# explain select a, count(*) from dd_part_singlecol where a=1 group by a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..2160.85 rows=467 width=12)
-> GroupAggregate (cost=0.00..2154.62 rows=156 width=12)
Group Key: dd_part_singlecol.a
-> Append (cost=0.00..2152.28 rows=156 width=4)
-> Seq Scan on dd_part_singlecol_1_prt_2 dd_part_singlecol_1 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
-> Seq Scan on dd_part_singlecol_1_prt_3 dd_part_singlecol_2 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
-> Seq Scan on dd_part_singlecol_1_prt_4 dd_part_singlecol_3 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
-> Seq Scan on dd_part_singlecol_1_prt_5 dd_part_singlecol_4 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
-> Seq Scan on dd_part_singlecol_1_prt_6 dd_part_singlecol_5 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
-> Seq Scan on dd_part_singlecol_1_prt_extra dd_part_singlecol_6 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
Optimizer: Postgres query optimizer
(17 rows)
gpadmin=# select a, count(*) from dd_part_singlecol where a=1 group by a;
a | count
---+-------
1 | 1
(1 row)
Time: 3.517 ms
parallel direct dispatch:
gpadmin=# explain select a, count(*) from dd_part_singlecol where a=1 group by a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice1; segments: 6) (cost=1079.26..1091.72 rows=935 width=12)
-> Finalize HashAggregate (cost=1079.26..1080.81 rows=156 width=12)
Group Key: dd_part_singlecol.a
-> Redistribute Motion 2:6 (slice2; segments: 2) (cost=0.00..1078.87 rows=78 width=12)
Hash Key: dd_part_singlecol.a
Hash Module: 3
-> Partial GroupAggregate (cost=0.00..1077.31 rows=78 width=12)
Group Key: dd_part_singlecol.a
-> Parallel Append (cost=0.00..1076.14 rows=78 width=4)
-> Seq Scan on dd_part_singlecol_1_prt_2 dd_part_singlecol_1 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
-> Seq Scan on dd_part_singlecol_1_prt_3 dd_part_singlecol_2 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
-> Seq Scan on dd_part_singlecol_1_prt_4 dd_part_singlecol_3 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
-> Seq Scan on dd_part_singlecol_1_prt_5 dd_part_singlecol_4 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
-> Seq Scan on dd_part_singlecol_1_prt_6 dd_part_singlecol_5 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
-> Seq Scan on dd_part_singlecol_1_prt_extra dd_part_singlecol_6 (cost=0.00..358.58 rows=26 width=4)
Filter: (a = 1)
Optimizer: Postgres query optimizer
(22 rows)
gpadmin=# select a, count(*) from dd_part_singlecol where a=1 group by a;
a | count
---+-------
1 | 1
(1 row)
Time: 4.156 ms
Slice2 's origin Motion(6:6) is reduced to Motion(2:6) due to direct_dispatch.
And it have to Gather(6:1) as we will use a parallel plan.
We should reconsider a direct-dispatch able plan in parallel mode.
What you think should happen instead
We should reconsider direct dispatch in parallel mode, it may not be better than a Single process.
How to reproduce
dd_part_singlecol in regression.
Operating System
Ubuntu
Anything else
No response
Are you willing to submit PR?
- [ ] Yes, I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct.