cloud-sql-proxy icon indicating copy to clipboard operation
cloud-sql-proxy copied to clipboard

`examples/k8s-health-check` folder README advises against readiness probes for (to me) incomprehensible reasons

Open sh-at-cs opened this issue 1 year ago • 3 comments

Description

The examples/k8s-health-check folder's README.md says:

For most common usage, adding a readiness healthcheck to the proxy sidecar container is unnecessary. An improperly configured readiness check can degrade the application's availability. [...] Most applications are resilient to transient database connection failures, and do not need to be restarted. [...] You should use the proxy container's readiness probe when these circumstances should cause k8s to terminate the entire pod: [...]

That makes it sound as if a failing readiness probe would cause the Pod to be restarted and we shouldn't use them for the cloud-sql-proxy sidecar for that reason, but this doesn't seem to be the case (?)

Instead, if I understood k8s's docs correctly, a failing readiness probe will not cause a Pod to be restarted, and readiness probes are only important for two things:

  1. Directing traffic to a Pod - only ready Pods will receive traffic.
  2. Determining whether a Pod in a Deployment is considered "available" in the context of the RollingUpdate strategy.

And for (2), it seems to me like having an accurate readiness probe is in fact essential, because otherwise the rolling update will consider pods "available" that are really broken, which then causes k8s to stop healthy but outdated Pods to make room for them.

So I don't understand this piece of advice.

Does it just confuse the readiness probe with the liveness probe (maybe because that was introduced later? But then again, the document does mention the liveness probe, too...)?

Potential Solution

Fix the README so it accurately reflects what a readiness probe is for and how it's relevant, or, if it turns out I'm wrong, link to the paragraphs in the k8s docs where my misconceptions are corrected from README.md so other users don't develop the same ones in the future.

Additional Details

There are similar comments in the examples/k8s-health-check/proxy_with_http_health_check.yaml file, which should be corrected as well if it turns out that there is something to correct here.

sh-at-cs avatar Aug 27 '24 15:08 sh-at-cs

Hi @sh-at-cs thanks for raising an issue on the Cloud SQL Proxy!

I'll let @hessjcg respond to this as he is our kubernetes expert and should be able to provide guidance here 😄

jackwotherspoon avatar Aug 28 '24 14:08 jackwotherspoon

@sh-at-cs, You raise a valid point. The k8s readiness probe controls how traffic is routed, not whether the pod is restarted. I will update the README to make this clear.

This advice is based on our practical experience. Our users generally get better availability when they configure readiness probes only on the main application container. Having a readiness probe on the main container then makes redundant the readiness probe on the to the proxy sidecar.

We should also update the examples to demonstrate how to make the proxy container start up to a ready state before the main application container starts. (The implementation is discussed in more detail in https://github.com/GoogleCloudPlatform/cloud-sql-proxy-operator/issues/496 and https://github.com/GoogleCloudPlatform/cloud-sql-proxy-operator/issues/381)

hessjcg avatar Aug 30 '24 16:08 hessjcg

Having a readiness probe on the main container then makes redundant the readiness probe on the to the proxy sidecar.

Just to add one more thought (after I ran into an issue related to this again): The docs should be sure to point out that it's only redundant if the main container's own readiness probe is dependent on having successfully established a database connection via cloud-sql-proxy.

Kind of obvious, but many web frameworks establish DB connections lazily, so if you have a "naive" readiness probe in your main container that just calls a healthcheck endpoint which doesn't require DB access, this won't be the case.

sh-at-cs avatar Apr 10 '25 08:04 sh-at-cs