feat(pg2pulsar): add Tables option to filter published tables
Summary
Add support for specifying tables to capture in pg2pulsar. When Tables is empty (default), all tables are captured. When specified, only the listed tables are included in the PostgreSQL publication.
- Add
Tablesfield toPGXSourcestruct - Add
--Tablesflag topg2pulsarandconfigurecommands - Use
pgx.Identifierfor safe SQL identifier quoting (prevents SQL injection) - Default to
publicschema when table name has no schema prefix -
Auto-include
pgcapture.ddl_logsfor DDL replication support
Note: This feature only works with the pgoutput decode plugin.
Known Limitations
⚠️ DDL events include all tables: When using --Tables to filter specific tables, pgcapture.ddl_logs is automatically included to support DDL replication. However, ddl_logs captures DDL changes for all tables in the database (via PostgreSQL event triggers), not just the specified tables.
This means:
- The downstream sink will receive DDL events for tables not in the
--Tableslist - If the downstream PostgreSQL only has the filtered tables, DDL operations on other tables may fail (e.g.,
ALTER TABLE other_tablewill error with "relation does not exist")
Possible future improvements:
- Filter DDL events at the source or sink level by parsing the DDL query
- Add an
--IncludeDDLflag to let users opt-out of DDL replication
Changed Files
-
pkg/source/postgres.go- Core implementation -
pkg/sql/source.go- SQL template for table-specific publication -
cmd/pg2pulsar.go- CLI flag -
cmd/configure.go- Agent config flag -
cmd/agent.go- Parameter handling
We might need to include the
pgcature.ddl_logsfor being able to capture the DDL commands.
fixed in https://github.com/replicase/pgcapture/pull/83/commits/8e189e3ddd56b287faaa56cb1154849cc28d12e0