vector icon indicating copy to clipboard operation
vector copied to clipboard

A new source for Postgres logical replication messages

Open imor opened this issue 2 years ago • 3 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

We already use vector to collect logs from our infra and I was wondering if we can use it to collect Postgres logical replication messages too? Logical replication messages will (very roughly) look something like this:

{"event_type": "insert", "data": {...}
{"event_type": "update", "data": {...}
{"event_type": "delete", "data": {...}

Attempted Solutions

There's a Postgres source but it only collects metrics not replication messages.

Proposal

There are two broad approaches:

  1. Merge a new Postgres replication source in this repo. I can work on this if it is something the team sees fit the project vision.
  2. Allow custom sources to be written which can be plugged into vector. I'm not sure this is currently possible and couldn't find anything in the repo or docs. If this is acceptable/possible it might open a door for more community sources/sinks.

References

I could only find the following two issues for custom plugins:

  • https://github.com/vectordotdev/vector/issues/1500
  • https://github.com/vectordotdev/vector/issues/3951

Version

0.36.0

imor avatar Feb 15 '24 07:02 imor

Thanks @imor ! How does Postgres store those replication logs? Is it in a file? Or in the database itself?

jszwedko avatar Feb 15 '24 14:02 jszwedko

A bit of context: we are trying to build a logical replication pipeline (similar to this) which can capture changes in Postgres tables and dump these into a backend like S3. The idea is to reuse vector for this such that a source can read replication messages from Postgres (which are in Postgres wire protocol format), convert them into a json format, and push them to S3. We want to reuse vector because we already use it for collecting logs/metrics.

How does Postgres store those replication logs? Is it in a file? Or in the database itself?

Postgres stores the changes in its Write Ahead Log (WAL) which are converted into Postgres wire protocol format when a client reads them.

imor avatar Feb 16 '24 07:02 imor

A bit of context: we are trying to build a logical replication pipeline (similar to this) which can capture changes in Postgres tables and dump these into a backend like S3. The idea is to reuse vector for this such that a source can read replication messages from Postgres (which are in Postgres wire protocol format), convert them into a json format, and push them to S3. We want to reuse vector because we already use it for collecting logs/metrics.

How does Postgres store those replication logs? Is it in a file? Or in the database itself?

Postgres stores the changes in its Write Ahead Log (WAL) which are converted into Postgres wire protocol format when a client reads them.

Gotcha, I see. This sounds like the file source could work if it was able to decode files in that format, then? Assuming that is the case, this could make sense as a new "codec" on the file source.

jszwedko avatar Feb 16 '24 14:02 jszwedko

I'm not sure how a file source codec works, does it read from a file and interprets the data in a certain format? If yes, this might not work because logical replication events can only be read over a TCP connection to the Postgres database similar to how postgresql_metrics.rs connects to Postgres.

imor avatar Feb 21 '24 05:02 imor

I'm not sure how a file source codec works, does it read from a file and interprets the data in a certain format? If yes, this might not work because logical replication events can only be read over a TCP connection to the Postgres database similar to how postgresql_metrics.rs connects to Postgres.

Ah I see. A separate source may make sense then.

jszwedko avatar Feb 21 '24 15:02 jszwedko

Cool, I'll raise a PR with a new source once my replication code is a bit more stable.

imor avatar Feb 22 '24 05:02 imor