pg_partman icon indicating copy to clipboard operation
pg_partman copied to clipboard

Fix run_maintenace when no default table exists

Open sj26 opened this issue 3 years ago • 7 comments

We're encountering an error like this when running CALL run_maintenance_proc() on our database with a native range partitioned table without partitions:

psql:.../db/structure.sql:8965: ERROR:  null values cannot be formatted as an SQL identifier
CONTEXT: PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 323 at EXECUTE
SQL statement "SELECT partman.run_maintenance('public.agent_details', p_jobmon := 't')"
PL/pgSQL function partman.run_maintenance_proc(integer,boolean,boolean) line 42 at EXECUTE
DETAIL:
HINT:
CONTEXT:  PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 410 at RAISE
SQL statement "SELECT partman.run_maintenance('public.agent_details', p_jobmon := 't')"
PL/pgSQL function partman.run_maintenance_proc(integer,boolean,boolean) line 42 at EXECUTE

We're using v4.5.1. I traced it to:

https://github.com/pgpartman/pg_partman/blob/v4.5.1/sql/functions/run_maintenance.sql#L317

v_default_tablename is NULL.

That should be set to the parent tablename earlier: https://github.com/pgpartman/pg_partman/blob/v4.5.1/sql/functions/run_maintenance.sql#L168

but v_is_default will be NULL if there are no partitions at all, and (NULL != 'DEFAULT') IS NULL, and IF NULL THEN ... END will not enter the conditional block because NULL is not TRUE, so it doesn't enter the conditional and default to the parent table name.

(NULL IS DISTINCT FROM 'DEFAULT') IS TRUE, however, and works the way it should, and preserves ('any string' IS DISTINCT FROM 'DEFAULT') IS TRUE, while ('DEFAULT' IS DISTINCT FROM 'DEFAULT') IS FALSE.

Sorry I'm not familiar enough with the test suite to write a test. If someone can provide some pointers I'll have a go. I presume it would require adding a test in test/test-id-run-maint.sql?

sj26 avatar Jul 14 '22 08:07 sj26

Just to clarify, you are saying there are no child tables at all in this partition set?

keithf4 avatar Jul 14 '22 12:07 keithf4

Yes, that’s right.

We remove the default partition so that it doesn’t accumulate unpartitioned data which becomes a problem to partition later. In continuous integration environments with a freshly loaded database with no default partition this means there are no child tables until the maintenance process is run to premake them, for example.

sj26 avatar Jul 14 '22 21:07 sj26

Ok. I wanted to make sure. pg_partman cannot work without having at least one child table. Otherwise it has no idea what the "next" table to create is. The create_parent() function should create at least the premake child tables for that. And retention should never drop the last child table. So how are you getting in the situation where there are no child tables?

keithf4 avatar Jul 14 '22 21:07 keithf4

We do a schema dump and restore between continuous integration test runs, and ignore child partitions because they make the schema unstable.

How does create_parent know what the "next" table is? I presume it knows what the "first" table is, and in the absence of any "next" table it should create the "first" table?

sj26 avatar Jul 14 '22 22:07 sj26

Sorry, we might be holding it wrong! How would you recommend make a stable, repeatably loadable schema for setting up ephemeral test environments? We're using Ruby on Rails and so have decorated its schema dumper to add some partman commands at the end to try and get things consistently configured. At the moment we're dumping the contents of the partman.part_config table and INSERT-ing it during fresh db load, then running the maintenance proc. I suppose we could change it to use create_parent calls instead, but how would we reverse-engineer the calls to make from the state of the database being dumped?

sj26 avatar Jul 14 '22 22:07 sj26

The issue here is that you're not going to be able to run create_parent() if you're already restoring the contents of the part_config table. You'll get a duplicate key violation because that config already exists. There's nothing special in the part_config table that cares what child tables exist. So, as part of your restore procedure, you could just create any valid child table for that partition set. Might be able to look at calling the relevant create_partition_*() function directly and feeding it one or more values for the p_partition_*[] array parameter. That's actually what create_parent() is doing.

To clarify, create_parent() doesn't need to know the next table. It's creating the initial tables. For time-based, by default it does that based on now(). For ID based it always starts at 0 unless a different start partition is given. It's the run_maintenance() process that requires there to be children. It needs at least one child table so it knows if it needs to make more based on the premake and what those new ones should be.

keithf4 avatar Jul 15 '22 13:07 keithf4

Another option would be to use the dump_partitioned_table_definition() on the original system to generate the create_parent() call along with an UPDATE statement to restore the remaining configuration options. Note this currently doesn't account for sub-partitioning. You could then just run those statements in your test environment vs restoring the config table contents.

keithf4 avatar Jul 15 '22 13:07 keithf4

Just checking to see if any of the suggestions i made worked for you?

I won't be merging this PR at this time.

keithf4 avatar Aug 15 '22 14:08 keithf4

No worries, we'll find another solution.

Another option would be to use the dump_partitioned_table_definition() on the original system to generate the create_parent() call along with an UPDATE statement to restore the remaining configuration options. Note this currently doesn't account for sub-partitioning. You could then just run those statements in your test environment vs restoring the config table contents.

This sounds very promising. Perhaps we can teach the rails schema dumper to do something like this.

sj26 avatar Aug 22 '22 01:08 sj26