pgcat PgCat has higher overall latency / seems to swap connections around on k8s

Describe the bug We attempted to replace pgbouncer with PgCat as our connection pooler for our kubernetes app, and encountered unexpected behavior. The average latency to the database seems to have increased ~15ms upon switching the web application over, and throughout our trial run, the 6 PgCat pods we created seemed to dramatically re-arrange their client connections whenever new pods were created or removed due to autoscalers, leading to even higher latency.

To Reproduce

Deploy the helm chart with 6 replicas. (full config is below).
Point some 50~ high-traffic pods to the PgCat service.

Expected behavior TPS and latency are expected to be, at least, on par with pgbouncer, ideally better. Additionally, client connections among each individual PgCat pod were expected to remain mostly stable, only increasing/decreasing as new pods and connections were created or removed, rather than the quite dramatic re-arrangements observed.

Screenshots We enabled prometheus metrics for this trial, and graphed several metrics in datadog. Attached below are graphs showing puma's response time for various components both before and after switching it to PgCat at around 11:40. The immediate increase in average latency is quite clear. After 12:00, as additional pods began spinning up (due to autoscalers), connections began jumping between the 6 PgCat pods at a surprising degree, further increasing the latency.

puma's response time for its various services. The top light-blue segment represents latency to the DB. Screenshot from 2024-12-06 09-09-57

PgCat client connections graphed per pod, per database. Each line is cl_active + cl_idle metrics to indicate total connections. Screenshot from 2024-12-06 09-10-16

PgCat server connection utilization percentage per pod, per database. Each line is databases_current_connections / databases_pool_size * 100) to give a percentage. Screenshot from 2024-12-06 09-10-28

Kubernetes info:

Version: 1.30
Worker node OS: Ubuntu 22.04
PgCat version: v1.2.0

Additional context Full config used for this trial:

[general]
host = "0.0.0.0"
port = 6432
enable_prometheus_exporter = true
prometheus_exporter_port = 9930
connect_timeout = 15000
idle_timeout = 600000
server_lifetime = 3600000
server_tls = false
idle_client_in_transaction_timeout = 0
healthcheck_timeout = 1000
healthcheck_delay = 30000
shutdown_timeout = 60000
ban_time = 60
log_client_connections = false
log_client_disconnections = false
tcp_keepalives_idle = 5
tcp_keepalives_count = 5
tcp_keepalives_interval = 5
admin_username = XXXXXXXX
admin_password = XXXXXXXX

##
## pool for primary
##
[pools."primary"]
pool_mode = "transaction"
load_balancing_mode = "random"
default_role = "any"
prepared_statements_cache_size = 500
query_parser_enabled = false
primary_reads_enabled = true
sharding_function = "pg_bigint_hash"

##
## pool for replica
##
[pools."replica"]
pool_mode = "transaction"
load_balancing_mode = "random"
default_role = "any"
prepared_statements_cache_size = 500
query_parser_enabled = false
primary_reads_enabled = true
sharding_function = "pg_bigint_hash"

## pool primary user
##
[pools."primary".users.0]
username = XXXXXXXX
password = XXXXXXXX
pool_size = 200
statement_timeout = 0
min_pool_size = 3

## pool replica user
##
[pools."replica".users.0]
username = XXXXXXXX
password = XXXXXXXX
pool_size = 200
statement_timeout = 0
min_pool_size = 3

## pool primary database mastodon
##
[pools."primary".shards.0]
servers = [
    [ "10.1.1.6", 5432, "primary" ],
]
database = "webapp"

## pool replica database mastodon
##
[pools."replica".shards.0]
servers = [
    [ "10.1.1.7", 5432, "primary" ],
]
database = "webapp"

This issue is potentially related to another high-latency issue https://github.com/postgresml/pgcat/issues/459. Also tagging https://github.com/postgresml/pgcat/issues/872 as they are having performance issues with their attempt.

Dec 06 '24 08:12 timetinytim

On the first graph, what is aggregation function is this? Averages, Medians, p99s? When we started using PgCat in production about a year ago, we noticed that Pgbouncer was a bit faster but not 15ms faster. It was more on the order of 2ms at the p99. So 15ms is pretty high and I am curious about the setup you have running.

The client connection churn is not something I saw in production. How does Pgbouncer behave when the autoscaler kicks in? If the autoscaler is just adding pods, PgCat won't drop client connections in response, do we know why client connections are being dropped on PgCat?

Dec 06 '24 11:12 drdrsh

@drdrsh This is an average function. And I want to clarify, the first graph is response time (i.e. ping), not actual throughput. Being ~15ms higher means the latency/ping is higher, which indicates a decrease in performance, not an increase. We didn't actually get around to testing actual throughput or performance as the increase in latency (especially the spikes) were getting bad enough that we had to back down from the trial.

As far as how pgbouncer behaves in our situation, below is a graph over the last hour of client connections per node in pgbouncer. Throughout this hour dozens of pods have been created and deleted (this autoscaler only adds/deletes pods), all of which need DB connections. This doesn't differentiate between databases, which is why it looks a little different, but it is much less chaotic, which is what I was expecting from PgCat as well: Screenshot from 2024-12-06 12-57-54

Dec 06 '24 12:12 timetinytim

On the autoscaling side, I will try to reproduce your findings.

On the first graph, I am interested in seeing the percentiles and the max. I want to see if the bump is due to higher latency overall or tail latencies driving the mean up

Dec 06 '24 12:12 drdrsh

@drdrsh No problem. Below are the percentiles for the first graph: Screenshot from 2024-12-06 13-37-43

And below is the max. Separate graph because it's measured in minutes in many places, while the percentiles were just seconds: Screenshot from 2024-12-06 13-38-29

Dec 06 '24 12:12 timetinytim

To provide additional information, the mastodon.http.duration.db metric is the time spend doing database queries, as reported by Rails, and is extracted from our web server logs.

When the auto-scaler kicks it, it will start / stop pods that run Ruby/Rails processes, which will in turn connect to pgcat / pgbouncer. Each process/pod has it's own internal connection pool (managed by Rails).

Dec 06 '24 13:12 renchap

FWIW, I am seeing this too. Have not tried with PgBouncer yet, so I am purely comparing against the raw db. I would expect a small amount (1 ms, perhaps) of overhead due to the additional hop, but I am seeing 15~ ms as well.

Jan 28 '25 02:01 ian-axelrod-sl

Just in case, @timetinytim -- Are you positive the pgcat version that is being used is 1.2.0? I ask because I fixed the issue about 30 minutes ago by specifying the v1.2.0 tag, as the Helm chart defaults to the main tag, which is substantially older. I am now seeing 1-2~ ms of overhead @ p99, rather than 15~ ms.

Jan 28 '25 02:01 ian-axelrod-sl