Using automatic PSC endpoint creation with Auth Proxy, backup recovery considerations
Question
Hi,
I am trying to use the CloudSQL Auth Proxy to connect to an existing private database instance that is in a different project to my consumer VPC. It is not possible to use Private Service Access (PSA) in this case (for the reasons described by @jackwotherspoon here) and I must use Private Service Connect (PSC).
Background
I have successfully used Private Service Connect (PSC) in my dev environment using the new "Create the endpoint automatically" approach, however I have some concerns about the implications this has on database backup restore operations. I'd like your advice on how to architect this.
My knowledge of CloudSQL backup restores is that it is bad practice to restore to the same instance you're running in production. The console warns not to do this in production:
Based on this, my database disaster recovery playbooks involve creating a new database instance. However, when using PSC to connect to the database, this introduces operational complexities which are not fully alleviated by using the new "Create the endpoint automatically" approach.
PSA-only DB restores
If I were connecting to the database only via PSA, backup restores are trivial. I simply create a new instance, restore the backup, and change my CloudSQL Auth Proxy config to point to the new instance connection name.
PSC DB restores
It is not as simple as above when using PSC. I think it's best to explain with some architecture diagrams.
To start, here's how my architecture looks when using PSC with CloudSQL Auth Proxy:
Suppose I need to do a database backup restore due to some data corruption in production. The way I would do this is by creating a new database instance and restoring the backup. The PSC Endpoint will be automatically created in my consumer project, as per my service connection policy.
To complete the database backup restore, I need to:
- Create a new DNS A record from "recommended instance PSC DNS name" to the auto-created PSC endpoint's IP Address
- Switch the CloudSQL Auth Proxy instance connection name to new DB and deploy
This looks like this:
Problem
The main problem is that this turns a database restore into a much more operationally complex procedure. Imagine being the on-call engineer that needs to do all of this at 2am, imagine if you didn't know anything about Private Service Connect. Additionally, this makes the potential downtime in this disaster much longer than necessary, which increases the likelihood we will miss our SLAs.
Really, what we need is a way to automate this DNS record creation following the automatic PSC creation. If this were the case, switching to the new instance would be as simple as before, as it just requires changing the instance connection name on the CloudSQL Auth Proxy.
Please can you check the following:
- Is there an alternative architecture where I do not have these challenges?
- Is there a way to automatically provision the DNS records too?
- Are there any planned features or feature requests to allow this?
This functionality means a lot to my company, and will give us tremendous value, but I am struggling with the practicalities right now.
Thank you in advance!
Hi @OscarVanL,
Thank you for your detailed framing of the problem with PCS and disaster recovery. This is great input. We are working solutions to automate disaster recovery for PCS instances. As of April 2025 this is still a work in progress, so you will need do more setup on your own.
My recommendation for now is to create your own database load-balancer in your GKE cluster. Configure the load balancer to send all database connections to the primary database instance. When you fail-over, reconfigure the database load-balancer to send connections to the new, recovered instance.
Here is one way to do it:
Create the primary and replica PSC databases, service attachments, and DNS records so that they exist before the fail-over event.
Run a database proxy Deployment in your GKE cluster with 2 containers: The Cloud SQL AuthProxy is the first container is configured to open connections to both primary and replica databases. The second container is fully-featured database load balancer like HAProxy, PGBouncer, or Envoy. The load balancer is configured to send inbound connections to the AuthProxy, to the primary instance.
To fail over: restore the backup to the replica database instance, and then switch the Database Load Balancer configuration to direct all connections to the replica instance.
| DB Load Balancer Pod ---- | -------> DB1
Your app -----> | HAProxy ---> AuthProxy --| |
| ---- | -------> DB2
I will leave this issue open to keep track of the interest in a more automated solution.
@hessjcg
Thank you for this detailed response! I am pleased to hear that solutions are being planned.
Your architecture suggestions would definitely solve the issues I described. I like that using this design I could have all the PSC configs and DNS records already in place ready for disaster recovery.
However, it is a little unfortunate this architecture requires me to run two database instances instead of just the one.
Initially I thought with this design I could create this "DB2" spare, setup the PSC and DNS records ready, then stop the instance to save compute/cost. But then I realised that it would be challenging to keep the configuration of the "DB1" and "DB2" instances in sync, as some cloudsql config can't be changed when the instance is stopped (E.g. see here).
We use a pretty expensive "Enterprise Plus" instance type, so having to double this cost would be a shame.
Thank you for the tips, I'll have to weigh up the pros and cons.