PostgreSQL 18 — What matters for Citus
1 · Extension packaging & control‑file handling
| What changed | Why it matters for Citus |
|---|---|
extension_control_path GUC — lets the server search multiple directories for .control and .sql files 📁 |
Citus can now ship its control/SQL files in a versioned directory (e.g. /opt/citus/pg18) instead of $sharedir/extension, simplifying OS packaging and container images. Update the build to stop hard‑coding $libdir. ([commit]) |
| Checksums enabled by default — initdb now turns on data page checksums unless --no-data-checksums is passed. | during MVU either pass --no-data-checksums to all new PG 18 initdb calls, or enable checksums on every PG 13–17 server before upgrading. No Citus code change. |
2 · Planner & executor changes that surface in distributed plans
| New core behaviour | Action for Citus |
|---|---|
| Right Semi Join plans (hash or nested‑loop) | Add Right-Semi-Join case in plan printer (commit) |
| Convert 'IN (VALUES ...)' to 'x = ANY ...' for better optimizer statistics | nothing to change in Citus, but expect different plan shapes & test output. (commit) |
| OR‑to‑array rewrite | nothing to change in Citus, but expect different plan shapes & test output. (commit) |
| Allow merge joins to use incremental sorts | Auto‑benefit; “no action”. (commit) |
| Allow skip scans of btree indexes | Workers can avoid full composite‑key scans; useful when the distribution key is the second column of a multi‑column index. (commit) |
3 · Partitioning, constraints & DDL that affect sharding logic
| Feature | Impact |
|---|---|
| Allow the specification of non-overlapping PRIMARY KEY and UNIQUE constraints | (commit) |
| Virtual generated columns (default now) | Zero‑storage computed columns work automatically on shards — handy for derived JSON or geom fields in columnar storage. Citus DDL propagation must copy the generation expression. ([commit) |
| Change VACUUM and ANALYZE to process the inheritance children of a parent | The coordinator‑side table in a distributed table still contains no rows, but VACUUM now simply returns with a NOTICE instead of doing nothing silently; the warning is “parent table contains no data”. Users can invoke VACUUM (ANALYZE) on shards with run_command_on_workers() ([commit]) |
4 · Index‑build & storage performance
| Core improvement | Benefit on a Citus cluster |
|---|---|
| Parallel GIN index builds | Each worker can multithread its shard‑local GIN build; overall index‑build wall‑time drops proportionally to cores × shards. ([commit]) |
Asynchronous I/O subsystem (io_method=worker default, io_uring on Linux) |
Backends queue reads/writes to I/O workers (or io_uring), cutting stall time on big scans or heavy WAL flushes. Expect faster distributed COPY, ANALYZE and columnar queries. ([commit]) |
5 · Monitoring & EXPLAIN extras (useful when debugging Citus plans)
-
Per‑backend I/O stats (
pg_stat_get_backend_io, new columns inpg_stat_io) — lets you see exactly which worker backends are I/O‑bound. ([ReleaseNotes) -
EXPLAIN ANALYZEincludesBuffers:by default and shows per‑index lookup counts. Need to update regression outputs. ([ReleaseNotes)
6 · Security & authentication changes
| Change | What Citus deployers must do |
|---|---|
| MD5 password auth deprecated — warnings in PG 18, removal planned for PG 19 | Ensure all coordinator ↔ worker and client roles are on SCRAM or certificates. Update doc examples and CM tooling. Replace PASSWORD 'md5…' with SCRAM-SHA-256 [commit) |
postgres_fdw SCRAM passthrough (also used by dblink) |
If Citus ever relies on FDW connections (e.g. for reference-table replicas), you can eliminate stored passwords by enabling use_scram_passthrough. “Optional — handy for mixed Citus + postgres_fdw stacks, or for future Citus features.” |
| Renamed GUCs (ssl_ecdh_curve → ssl_groups, etc.) — several server variables have been renamed or newly introduced in PG 18. | Citus does not hard-code most GUC names, so runtime behaviour is unaffected. The only reference in the Citus codebase is the test harness using ssl_ecdh_curve, which must be switched to ssl_groups. Tests & Docs: perform a search-and-replace of deprecated GUC names and update sample postgresql.conf templates to include the new variables. |
Extension API and Packaging Changes
-
Extension control file search path: PG 18 adds a new mechanism to locate extension files via an
extension_control_pathGUC. This introduces a “search path” for extension control files (with a default of$systemfor the normal install directory). In practice, this allows packaging extensions outside the core installation directory – making it easier to distribute extensions separately and build custom extensions. Citus can leverage this to ship its control and SQL files in a dedicated directory without manual user configuration.- Package installs Citus control + SQL into its own directory.
- Package drops a 1‑line citus.conf that appends that directory to extension_control_path.
- User runs CREATE EXTENSION citus; – Postgres walks the search list, finds citus.control, and the extension loads.
(commit)
-
Module path and control file updates: Along with the new search path, PostgreSQL 18 updates best practices for extension control files. Notably, the
module_pathnamein extension control files should omit the$libdir/prefix, and thedirectoryparameter in control files (andMODULEDIRin Makefiles) is effectively deprecated. PostgreSQL 18 will ignore a$libdir/prefix (and documentation no longer recommends using it), which frees extensions from being tied to the corepkglibdir. The introduction ofextension_control_pathcovers the use-cases thatdirectoryused to address. Impact for Citus: The Citus extension’s control file should be adjusted to these new conventions for compatibility. Removing any hard-coded$libdirpaths ensures Citus can be installed in custom extension directories, aligning with PG 18’s extension-loading behavior. PG ≤ 17 does exactly the same: any relative path is resolved as $libdir/. (Article) -
Extension installation error messages: Error reporting when creating or updating extensions has been improved. PostgreSQL 18 provides more refined error messages during
CREATE EXTENSIONor extension script execution failures. While this doesn’t change extension functionality, it will help developers diagnose issues. For Citus users and developers, clearer errors (e.g. missing dependency, version mismatch, etc.) during extension installation mean easier troubleshooting when enabling Citus on a PG 18 cluster. (Article) (Article2) -
Extension upgrade considerations: As with any major release, extension authors must adapt to internal API changes. PostgreSQL 18 may have changed certain backend function signatures or structs that extensions rely on (e.g. planner hook APIs, catalog definitions). The community tracks extensions that fail to build on PG 18, prompting maintainers to update them. (For example, the
login_hookextension is discontinued by PG 18 due to upstream changes.) Citus compatibility: The Citus team will need to ensure that all planner/executor hook usages and any PG-internal calls are updated for PG 18. (Article)
Planner and Optimizer Changes
-
Hash Right Semi Join: The planner can now use a Hash Right Semi Join strategy for semi-joins (e.g. queries with
IN (SELECT ...)or semi-joins in EXISTS). In prior versions, hash joins in semi-join cases always hashed the inner side. PostgreSQL 18 lifts that limitation – it can choose to hash either side based on size. For example, in PG 17 a query withWHERE flight_id IN (SELECT ...)would hash the subquery (even if it’s larger). In PG 18, the planner recognizes when it’s cheaper to hash the outer relation instead. It introduces a Hash Right Semi Join plan, which in testing hashed the smallerflightstable instead of the largeticket_flights, dramatically reducing memory and execution time. This is even applied in parallel mode. Impact for Citus: Citus’s distributed planner must be aware of this new join type. For distributed subqueries that Citus pushes down or evaluates on workers, the workers running PG 18 can utilize this optimization automatically. On the coordinator side, if a semi-join involves both local and distributed parts, Citus might need to adjust its logic to not misinterpret a Hash Right Semi Join node. Overall, this improves performance of semi-join queries, benefiting distributed workloads where such joins are executed on workers. (Article) -
OR-to-ANY transformation: The optimizer can now automatically transform a chain of
ORconditions into an equivalent= ANY(array)form to use indexes more efficiently. In previous versions, a predicate likecol = 1 OR col = 2 OR col = 3would lead the planner to consider a BitmapOr of multiple index scans (or a sequence of OR filters). PG 18’s planner can internally rewrite this ascol = ANY('{1,2,3}'), using a single index scan on an array condition. The example from PG 18 shows anIndex Scan using flights_pkeywith a singleflight_id = ANY('{1,2,3}')condition, replacing three separate index scans in PG 17. This optimization happens transparently, even if the SQL is not written withANY. Impact for Citus: If a distributed query has largeORlists, the PG 18 planner can plan it more efficiently before delegating to workers. Citus benefits automatically. However, Citus should ensure that if such a plan is generated on a coordinator and then needs to be executed on workers, the pushdown logic supports array parameters or the coordinator handles it correctly. (Article1) (Article2) -
Incremental sort in merge joins: PostgreSQL 18 extends the incremental sort feature to more scenarios – notably, merge joins. In a merge join, if one side (the “outer” side of the join) is already sorted on a prefix of the JOIN keys, PG 18 can apply incremental sorting for the remaining keys instead of doing a full sort. For example, consider joining table
t1andt2on(a, b)witht1coming from an index scan ona. In PG 17, the planner would still fully sortt1on(a, b)before merging. In PG 18, it recognizes thatt1is pre-sorted byaand only needs to sort bybwithin eachagroup – so it uses an Incremental Sort fort1withPresorted Key: a. The plan shows anIncremental Sorton the outer side instead of a full sort, which saves work. Impact for Citus: This change is mostly internal to the PG planner, but Citus’s custom logic should be reviewed to ensure it doesn’t disable or undo this optimization. Overall, it improves performance for certain distributed join queries that rely on sorted data from shards. (Article)(Article2) -
Parallel nested loop improvements: PostgreSQL 18 gives the planner more flexibility with Parallel Nested Loop Joins. PG 18 can now choose to materialize an inner plan (cache it in memory) in a parallel nested loop if beneficial. It “gives the planner more opportunities to choose the optimal plan in other situations”. Impact for Citus: In a distributed query, if the coordinator uses a nested loop to join a local small table to data coming from workers (or vice versa), this optimization could kick in and avoid redundant work. It also sets the stage for smarter parallel plans. Citus’s planner hooks likely don’t interfere with this, but Citus should be tested with PG 18 to ensure that if a Materialize is introduced in a plan, it doesn’t affect result routing logic. (Article)
-
Better planning for generate_series and functions: PostgreSQL 18 allows built-in set-returning functions to supply planner support functions for estimation. A concrete example is
generate_seriesfor timestamps: PG 18’s planner now knows how to estimate the number of rowsgenerate_series()will return for a given range. In anEXPLAIN, a call likegenerate_series(now(), now() + interval '1 day', '1 hour')will show an accurate row count (e.g. 25 rows for hourly series over a day). Previously, such functions often had default or poor row estimates. Impact for Citus: Citus might not need any action here, but it gains from more accurate core planning. (Article) -
Removal of “huge cost” hack: An internal improvement: the planner no longer relies on setting path costs to an arbitrary huge number (10 billion) to discourage certain plans. For example, toggling
enable_seqscan = offin older versions would set sequential scan cost to1e10to make it never chosen. PostgreSQL 18 eliminates this10000000000cost approach (it uses a more direct method to disable paths). This is mostly an internal refactor, but it means EXPLAIN output will no longer show these fake huge costs. Impact for Citus: Minimal direct impact, but it slightly cleans up scenarios where Citus might disable some paths during planning. When Citus set a GUC likeenable_nestloopoff on workers for planning reasons, EXPLAIN output need updating due to modification on the costs. (Article) (Article2)
See #7992 and #8250 for details