services/horizon: upgrade psql support to most recent versions
What would you like to see?
RDS has very recently added support for psql15, so we're quite a few major versions behind at this point. We should support (at a minimum) the last 2 major versions, which is this case would be psql 14 and psql 15. However, we'll likely want to go back further with postgres (for CI/testing purposes in this repo). Because we build apt packages for ubuntu 20.04 and 22.04, and those distros ship with postgres 12 and 14, respectively, we should make sure we're still running tests against 12+
We currently do not have alignment on which version of psql we support, test against, and deploy. We should sync all of these references up as part of this upgrade process as well. Current state is:
- Go monorepo integration tests are run against versions 9 and 10 of psql
- RDS instances for staging+prod for both testnet and pubset are running v12.13
- Puppet specifies these same instances as 10.15 (unclear to me where this is being overridden)
The scope of this issue will cover:
- Performance testing on v14 + query optimizations necessary to support it. There was an outdated/prior investigation that saw performance hits when upgrading to v14. We'll need to utilize goreplay/traffic mirroring in staging to check for any regressions, particularly on the more problematic queries (trade_aggregations, claimable_balances, etc)
- We don't have an existing performance test. This would actually mean mirroring traffic to this instance from production, and going through a similar release comparison process
- Migrating all production instances to the latest version possible (presumably, v15)
- Updating horizon helm chart + external documentation to recommend latest version possible
- Updating CI in this repo to run against all supported versions
What alternatives are there?
- Stay on current versions (production runs 12.13). This is quite out-of-date/old; currently RDS/aurora minimum version is 11, so we may soon be forced to upgrade. We're likely missing out on a lot of performance improvements, but also just generally not testing on/guaranteeing performance on versions that our partners will reasonably want to run
- Support only the versions that ship out-of-the-box with the distros for apt packages we build (this would be 12 and 14). Unless there's some significant differences that matter to us between 12+ and exclusively 12/14, I don't think this is worthwhile, as people running horizon are not ultimately limited by what version ships with their distro.
Note that now (when we're pulling this in) psql 16 is the most recent version of postgres, so this is the version we should upgrade to. Psql 16 is supposed to have performance improvements to the COPY function, which is now how ingestion works.
We should do a performance comparison between the new postgres 16 cluster and the current postgres 12 staging cluster. The performance comparison should include both latest ledger ingestion with traffic mirroring enabled and historical reingestion (e.g. on a period of 24 hours worth of ledgers).
We'll want to examine the following horizon metrics:
- Average Ledger Ingestion Duration
- ingestion processors Run Duration
- Response per status
- Request duration broken down per endpoint
We should also look at the postgres metrics dashboard to observe if there are any differences between the two postgres DBs.
Finally, we should look at the AWS RDS performance insights Top SQL dashboard and the metrics dashboard (particularly CPU utilization) for the two DBs.
Due to performance regressions noted on psql16, we are deferring this until after the truncation.