pgcopydb icon indicating copy to clipboard operation
pgcopydb copied to clipboard

Option "--split-tables-larger-than" is not being honored for tables with pk of int or when switching to use "ctid"

Open shane-borden opened this issue 1 year ago • 1 comments

Attempting to use the --split-tables-larger-than and in the case of a table with a pk of int or when switching to use CTID, pgcopydb is not splitting the copy. For example, given the following table:

([email protected]:5432) [tpcc] > \d+ stock
                                                    Table "public.stock"
    Column    |         Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 s_i_id       | integer               |           | not null |         | plain    |             |              |
 s_w_id       | integer               |           | not null |         | plain    |             |              |
 s_ytd        | integer               |           | not null |         | plain    |             |              |
 s_quantity   | smallint              |           | not null |         | plain    |             |              |
 s_order_cnt  | smallint              |           | not null |         | plain    |             |              |
 s_remote_cnt | smallint              |           | not null |         | plain    |             |              |
 s_dist_01    | character(24)         |           | not null |         | extended |             |              |
 s_dist_02    | character(24)         |           | not null |         | extended |             |              |
 s_dist_03    | character(24)         |           | not null |         | extended |             |              |
 s_dist_04    | character(24)         |           | not null |         | extended |             |              |
 s_dist_05    | character(24)         |           | not null |         | extended |             |              |
 s_dist_06    | character(24)         |           | not null |         | extended |             |              |
 s_dist_07    | character(24)         |           | not null |         | extended |             |              |
 s_dist_08    | character(24)         |           | not null |         | extended |             |              |
 s_dist_09    | character(24)         |           | not null |         | extended |             |              |
 s_dist_10    | character(24)         |           | not null |         | extended |             |              |
 s_data       | character varying(50) |           | not null |         | extended |             |              |
Indexes:
    "stock_i1" PRIMARY KEY, btree (s_i_id, s_w_id)
Access method: heap

([email protected]:5432) [tpcc] > \dt+ stock
                                  List of relations
 Schema | Name  | Type  | Owner | Persistence | Access method |  Size  | Description
--------+-------+-------+-------+-------------+---------------+--------+-------------
 public | stock | table | tpcc  | permanent   | heap          | 108 GB |
(1 row)

And specifying the following option, no split occurs:

pgcopydb list table-parts --schema-name public --table-name order_line --split-tables-larger-than 50GB
14:33:52.344 1099240 INFO   Running pgcopydb version 0.17-1.pgdg110+1 from "/usr/bin/pgcopydb"
14:33:52.387 1099240 INFO   Using work dir "/home/shaneborden_google_com/pgcopy_tmp/pgcopydb"
14:33:52.388 1099240 INFO   Table public.order_line is 97 GB large which is larger than --split-tables-larger-than 50 GB, and does not have a unique column of type integer: splitting by CTID
14:33:52.388 1099240 INFO   Table public.order_line COPY will be split 0-ways
        Part |          Min |          Max |        Count
-------------+--------------+--------------+-------------

If I switch to use a table with a singular pk, it still does not work:

([email protected]:5432) [postgres] > \d+ uuid_mod_test
                                                      Table "public.uuid_mod_test"
     Column     |              Type              | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
----------------+--------------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 id             | integer                        |           | not null |         | plain    |             |              |
 pkid           | text                           |           |          |         | extended |             |              |
 value          | numeric                        |           |          |         | main     |             |              |
 product_id     | integer                        |           |          |         | plain    |             |              |
 effective_date | timestamp(3) without time zone |           |          |         | plain    |             |              |
Indexes:
    "uuid_mod_test_pkey" PRIMARY KEY, btree (id)
    "func_mod_uuid_idx" btree (mod(abs(hashtext(pkid)), 12))
Access method: heap

([email protected]:5432) [postgres] > \dt+ uuid_mod_test
                                       List of relations
 Schema |     Name      | Type  |  Owner   | Persistence | Access method |  Size  | Description
--------+---------------+-------+----------+-------------+---------------+--------+-------------
 public | uuid_mod_test | table | postgres | permanent   | heap          | 893 MB |
(1 row)

And utilizing the following command the split still will not work:

pgcopydb list table-parts --schema-name public --table-name uuid_mod_test --split-tables-larger-than 1MB
15:05:47.567 1099748 INFO   Running pgcopydb version 0.17-1.pgdg110+1 from "/usr/bin/pgcopydb"
15:05:47.609 1099748 INFO   Using work dir "/home/shaneborden_google_com/pgcopy_tmp/postgres/pgcopydb"
15:05:47.610 1099748 INFO   Table public.uuid_mod_test COPY will be split 0-ways
        Part |          Min |          Max |        Count
-------------+--------------+--------------+-------------

What is needed to split these tables?

shane-borden avatar Aug 23 '24 15:08 shane-borden

same here with this table definition

 Column   |           Type           | Collation | Nullable |             Default              | Storage | Compression | Stats target | Description
-----------+--------------------------+-----------+----------+----------------------------------+---------+-------------+--------------+-------------
 id        | bigint                   |           | not null | generated by default as identity | plain   |             |              |
 timestamp | timestamp with time zone |           | not null |                                  | plain   |             |              |
 data      | double precision         |           | not null |                                  | plain   |             |              |
 stream_id | integer                  |           | not null |                                  | plain   |             |              |
Indexes:
    "nodes_data_pkey" PRIMARY KEY, btree (id)
    "nodes_data_brin_timestamp" brin ("timestamp")
    "nodes_data_unique_stream_id_timestamp" UNIQUE, btree (stream_id, "timestamp")
Foreign-key constraints:
    "nodes_data_stream_id_fkey" FOREIGN KEY (stream_id) REFERENCES nodes_stream(id) DEFERRABLE INITIALLY DEFERRED
Access method: heap
Options: autovacuum_analyze_scale_factor=0, autovacuum_vacuum_scale_factor=0, autovacuum_vacuum_threshold=500000, autovacuum_analyze_threshold=150000
10:07:28.648 10659 INFO   Running pgcopydb version 0.17-1.pgdg24.04+1 from "/usr/bin/pgcopydb"
10:07:28.695 10659 INFO   Using work dir "/tmp/pgcopydb"
10:07:28.697 10659 INFO   Table public.nodes_data COPY will be split 0-ways
        Part |          Min |          Max |        Count
-------------+--------------+--------------+-------------

dchimeno avatar Feb 14 '25 09:02 dchimeno