search-replica
search-replica copied to clipboard
PostgreSQL logical decoding to Elasticsearch/Opensearch
Search Replica
Replicates PostgreSQL primary database into Elasticsearch/OpenSearch index read-only replica.
Does not rely on additional database queries or external queues, using exclusively logical replication protocol, allowing almost realtime consistent synchronisation without external dependencies.
Focused on speed and efficiency.
Docs | DockerHub | Try it
- Consistent and fault tolerant, without dependencies
Thanks to PostgreSQL replication slots mechanisms. - Initial (re)indexing
UsingCOPYcommand - Uses native PG protocol
Both Text and Binary form. - Full DB types support
json fields, composite types, arrays, enums... Except of arrays of composite types. - Native Parent/Child join
Including document
_idandroutingcontrol - Limited denormalization & document modifications. check inlining
- Bulk requests Data is flushed to Elasticsearch/OpenSearch in bulk.
Configuration
ConfTags
You can
- set
routingand document_idfields; - rename or skip fields;
- define parent/child
joinfield; - inline rows as object into parent document;
- set custom inlining script;
- ~~set templated fields~~ (planned)
- ~~json-path names~~ (planned)
Using COMMENTs in your database schema. Check
syntax and description
Env Config
| Variable | Default | Description |
|---|---|---|
| PG_SLOT | pg2es | replication slot name |
| PG_PUBLICATION | search | publication name |
| PGHOST | localhost | |
| PGPORT | 5432 | |
| PGDATABASE | - | |
| PGUSER | - | |
| PGPASSWORD | - | |
| SEARCH_HOST | - | URL or host of ElasticSearch/OpenSearch |
| SEARCH_USERNAME | - | optional |
| SEARCH_PASSWORD | - | optional |
| SEARCH_BULK_SIZE | 4 | (MB) Bulk request size limit. |
| SEARCH_PUSH_INTERVAL | 30s | idle push interval, when there is no enough rows for full bulk request. |
| SEARCH_PUSH_THROTTLE | 500ms | hard limit. At most one request during this period. |
| SEARCH_PUSH_DEBOUNCE | 500ms | delays bulk after idle, to fetch related data. |
| LOG_FORMAT | json | json or cli |
| LOG_LEVEL | warn | from debug to fatal |
Notes
- The script is single threaded* (not a bottleneck)... Separate goroutine is used to make ES requests.
- Links between Database <-> Schema <-> Table <-> Column, shoudld be considered read only, and safe for multithread use... (not yet)
- It's fast. All the the efforts shuld be towards readability, reliability and functionality.
Known Limitations:
- No 1:1 inlines (yet)
- Delete document deletes all inlines (AKA DELETE CASCADE), and they can not be restored.