posthog
posthog copied to clipboard
Bug: Healthcheck topic is produced to more than consumed from for Kafka+Plugin server health check
Bug description
The consumer group backlog for the healthcheck topic's partition 0 is slowly creeping up. It's not a deal breaker or anything, but ideally we could have a consumer with round trip to kafka without growing the lag.
Routine for Kafka health check. https://github.com/PostHog/posthog/blob/master/plugin-server/src/main/utils.ts#L51
Environment
- [x] PostHog Cloud
- [x] self-hosted PostHog (ClickHouse-based), version/commit: please provide
- [x] self-hosted PostHog (Postgres-based, legacy), version/commit: please provide
Additional context
This isn't a huge problem, but more of a metrics and hygiene issue with kafka. Consumer group latency is a pretty standard way to check health of the cluster and your app. This breaks it for at least one topic.
What Klarna does using kafkajs library for health checks is https://github.com/tulios/kafkajs/issues/452#issuecomment-517747429