Integrity issue - PostgreSQL and ClickHouse out of sync
Bug description
PostHog uses two main databases:
-
PostgreSQL
-
ClickHouse
Some functionalities of PostHog rely on having a subset of data in sync between the two (e.g. person properties). We currently try to archive this by double writing to the two datastores directly in the app. Unfortunately, the datasets are far from being in sync as we use nothing like 2 phase commits that would rollback a transaction entirely if one of the write operation fails.
This data drift is creating issues at the application level and in the overall data integrity. I’m creating this issue to keep track of proposals to address the problem.
Environment
-
[X] PostHog Cloud
-
[X] self-hosted PostHog (ClickHouse-based), version/commit: please provide
-
[X] self-hosted PostHog (Postgres-based, legacy), version/commit: please provide
Additional context
- https://github.com/PostHog/posthog/issues/10058
My proposal is to implement a Change Data Capture process (see as example the section A Common CDC Architecture with Debezium)
This was potentially the cause of https://posthog.slack.com/archives/C0374DA782U/p1658758589681689