container probes fail when Cortex mutual TLS is enabled
Describe the bug I followed the instructions here to enable mTLS in Cortex. I deployed Cortex from the latest helm chart.
The outcome is pods are being terminated because the startup probe must present a client TLS when making a readiness probe request.
Fortunatelly the helm chart provides the ability to overwrite the container probes with a command instead of http-get. Unfortunatelly the Cortex image is missing curl to implement a probe alternative such as:
curl -s --cert /srv/certs/cert.pem --key /srv/certs/key.pem --cacert /srv/certs/ca.crt --resolve server.common.name:8080:127.0.0.1 https://server.common.name:8080/ready
I suspect adding curl to the Cortex base image would help. Is there anything else I could use instead ?
Looks like the following probe would pass as long as curl is in the cortex image:
startupProbe:
httpGet: # this is needed here to override default from chart's values, otherwise deployment fails with "Forbidden: may not specify more than 1 handler type"
exec:
command:
- curl
- --cert
- /srv/certs/cert.pem
- --key
- /srv/certs/key.pem
- --cacert
- /srv/certs/ca.crt
- --resolve
- server.common.name:8080:127.0.0.1
- https://server.common.name:8080/ready
Cert files are coming from a volume mounted from a k8s secret, the helm chart offers the option for extra volumes. Same files are referenced from Cortex TLS configuration.
So the only thing left here is to add curl to cortex base image.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
I would really prefer not to do this: binaries inside containers are useful to attackers. Really we should be going distroless.
I couldn't follow your reasoning why this is necessary; some lines have moved at the end of your links. Why can't probes be https rather than command-based?
The issue I'm trying to solve in https://github.com/cortexproject/cortex/pull/4557 is to have a way to run container probes when Cortex mTLS is enabled.
Server SSL certificate is not the issue here, http-get will ignore certificate validation, the issue is when mutual TLS is enabled in Cortex: in that case the probe needs to present a client TLS certificate to authenticate with the service. Currently http-get doesn't support that, the only way to workaround it is to replace the probe with a curl command. You could find more details about this by following the URL links in the initial post.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
not stale, MR still open
Would using tcpProbe solve the issue?
I am open to adding curl to the base image, but I also understand concerns from security and maintainability stand points. @alanprot @pracucci what do you think?
It might also be worth while to get more perspectives in the Slack channel, and start a vote.
A tcpProbe would probe the open port, not the service, and in that sense it is not sufficient to confirm the POD is ready to be added to the service pool. That's the reason why a custom command probe is needed, curl can take client TLS arguments to authenticate against TLS-enabled Cortex.
Agreed. @alanprot @pracucci what are your thoughts? I think we can have the curl added to the image, but we will not promise backward compatibility in the future; I still would like to think of a more elegant solution than relying on curl.
One alternative is to have a small binary like this one in the go image; essentially a "specialized curl".
I really hope k8s' httpProbe should support client cert config, but seems like there an issue that dragged on for years already.
Humm.. Should we consider having the healthcheck on another port without mtls?
I think i agree with @bboreham , i dont like the idea of shipping curl inside the image.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
not stale, MR still open