pgcopydb
pgcopydb copied to clipboard
Option "--split-tables-larger-than" is not being honored for tables with pk of int or when switching to use "ctid"
Attempting to use the --split-tables-larger-than and in the case of a table with a pk of int or when switching to use CTID, pgcopydb is not splitting the copy. For example, given the following table:
([email protected]:5432) [tpcc] > \d+ stock
Table "public.stock"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
s_i_id | integer | | not null | | plain | | |
s_w_id | integer | | not null | | plain | | |
s_ytd | integer | | not null | | plain | | |
s_quantity | smallint | | not null | | plain | | |
s_order_cnt | smallint | | not null | | plain | | |
s_remote_cnt | smallint | | not null | | plain | | |
s_dist_01 | character(24) | | not null | | extended | | |
s_dist_02 | character(24) | | not null | | extended | | |
s_dist_03 | character(24) | | not null | | extended | | |
s_dist_04 | character(24) | | not null | | extended | | |
s_dist_05 | character(24) | | not null | | extended | | |
s_dist_06 | character(24) | | not null | | extended | | |
s_dist_07 | character(24) | | not null | | extended | | |
s_dist_08 | character(24) | | not null | | extended | | |
s_dist_09 | character(24) | | not null | | extended | | |
s_dist_10 | character(24) | | not null | | extended | | |
s_data | character varying(50) | | not null | | extended | | |
Indexes:
"stock_i1" PRIMARY KEY, btree (s_i_id, s_w_id)
Access method: heap
([email protected]:5432) [tpcc] > \dt+ stock
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-------+-------+-------+-------------+---------------+--------+-------------
public | stock | table | tpcc | permanent | heap | 108 GB |
(1 row)
And specifying the following option, no split occurs:
pgcopydb list table-parts --schema-name public --table-name order_line --split-tables-larger-than 50GB
14:33:52.344 1099240 INFO Running pgcopydb version 0.17-1.pgdg110+1 from "/usr/bin/pgcopydb"
14:33:52.387 1099240 INFO Using work dir "/home/shaneborden_google_com/pgcopy_tmp/pgcopydb"
14:33:52.388 1099240 INFO Table public.order_line is 97 GB large which is larger than --split-tables-larger-than 50 GB, and does not have a unique column of type integer: splitting by CTID
14:33:52.388 1099240 INFO Table public.order_line COPY will be split 0-ways
Part | Min | Max | Count
-------------+--------------+--------------+-------------
If I switch to use a table with a singular pk, it still does not work:
([email protected]:5432) [postgres] > \d+ uuid_mod_test
Table "public.uuid_mod_test"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
----------------+--------------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | not null | | plain | | |
pkid | text | | | | extended | | |
value | numeric | | | | main | | |
product_id | integer | | | | plain | | |
effective_date | timestamp(3) without time zone | | | | plain | | |
Indexes:
"uuid_mod_test_pkey" PRIMARY KEY, btree (id)
"func_mod_uuid_idx" btree (mod(abs(hashtext(pkid)), 12))
Access method: heap
([email protected]:5432) [postgres] > \dt+ uuid_mod_test
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+---------------+-------+----------+-------------+---------------+--------+-------------
public | uuid_mod_test | table | postgres | permanent | heap | 893 MB |
(1 row)
And utilizing the following command the split still will not work:
pgcopydb list table-parts --schema-name public --table-name uuid_mod_test --split-tables-larger-than 1MB
15:05:47.567 1099748 INFO Running pgcopydb version 0.17-1.pgdg110+1 from "/usr/bin/pgcopydb"
15:05:47.609 1099748 INFO Using work dir "/home/shaneborden_google_com/pgcopy_tmp/postgres/pgcopydb"
15:05:47.610 1099748 INFO Table public.uuid_mod_test COPY will be split 0-ways
Part | Min | Max | Count
-------------+--------------+--------------+-------------
What is needed to split these tables?
same here with this table definition
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-----------+--------------------------+-----------+----------+----------------------------------+---------+-------------+--------------+-------------
id | bigint | | not null | generated by default as identity | plain | | |
timestamp | timestamp with time zone | | not null | | plain | | |
data | double precision | | not null | | plain | | |
stream_id | integer | | not null | | plain | | |
Indexes:
"nodes_data_pkey" PRIMARY KEY, btree (id)
"nodes_data_brin_timestamp" brin ("timestamp")
"nodes_data_unique_stream_id_timestamp" UNIQUE, btree (stream_id, "timestamp")
Foreign-key constraints:
"nodes_data_stream_id_fkey" FOREIGN KEY (stream_id) REFERENCES nodes_stream(id) DEFERRABLE INITIALLY DEFERRED
Access method: heap
Options: autovacuum_analyze_scale_factor=0, autovacuum_vacuum_scale_factor=0, autovacuum_vacuum_threshold=500000, autovacuum_analyze_threshold=150000
10:07:28.648 10659 INFO Running pgcopydb version 0.17-1.pgdg24.04+1 from "/usr/bin/pgcopydb"
10:07:28.695 10659 INFO Using work dir "/tmp/pgcopydb"
10:07:28.697 10659 INFO Table public.nodes_data COPY will be split 0-ways
Part | Min | Max | Count
-------------+--------------+--------------+-------------