Can't connect via client to frontend service with cert-manager mTLS certificate
Hey,
I've been trying to get mTLS up and running on my Temporal deployment. I have enabled mTLS on both internode communication and frontend communication. I have deployed the Temporal cluster like so (omitted extraneous data):
apiVersion: temporal.io/v1beta1
kind: TemporalCluster
metadata:
name: temporal-cluster
namespace: temporal
spec:
mTLS:
provider: cert-manager
internode:
enabled: true
frontend:
enabled: true
certificatesDuration:
clientCertificates: 48h0m0s
frontendCertificate: 48h0m0s
intermediateCAsCertificates: 128h0m0s
internodeCertificate: 48h0m0s
rootCACertificate: 256h0m0s
refreshInterval: 1h0m0s
renewBefore: 2h0m0s
I then created a TemporalClusterClient to get a certificate signed by the frontend intermediate CA in the test namespace:
apiVersion: temporal.io/v1beta1
kind: TemporalClusterClient
metadata:
name: example-worker
namespace: test
spec:
clusterRef:
name: temporal-cluster
namespace: temporal
The secret is provisioned correctly into the test namespace. I then mount that secret into my pod (other data omitted for brevity):
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-worker
namespace: test
spec:
template:
spec:
containers:
- name: worker
image: ...
env:
- name: TEMPORAL_ADDRESS
value: temporal-cluster-frontend.temporal.svc.cluster.local:7233
volumeMounts:
- mountPath: "/var/temporal/certs"
name: temporal-certs
readOnly: true
volumes:
- name: temporal-certs
secret:
secretName: temporal-cluster-example-worker-mtls-certificate
I get a bad certificate error when attempting to connect with the certificate:
Traceback (most recent call last):
File "/app/worker.py", line 83, in <module>
loop.run_until_complete(main())
File "/usr/local/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/app/worker.py", line 53, in main
client = await Client.connect(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/temporalio/client.py", line 164, in connect
await temporalio.service.ServiceClient.connect(connect_config),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The relevant worker code:
certs_directory = os.environ.get("TEMPORAL_CERTS_DIRECTORY", "/var/temporal/certs")
with open(os.path.join(certs_directory, "tls.crt"), 'rb') as f:
client_cert = f.read()
with open(os.path.join(certs_directory, "tls.key"), 'rb') as f:
client_key = f.read()
with open(os.path.join(certs_directory, "ca.crt"), 'rb') as f:
ca_cert = f.read()
# Connect client
client = await Client.connect(
os.environ.get("TEMPORAL_ADDRESS", "localhost:7233"),
namespace="default",
tls=TLSConfig(
client_cert=client_cert,
client_private_key=client_key,
server_root_ca_cert=ca_cert
)
)
I've also tried remove the server_root_ca_cert option and still get errors. However with exactly the same setup, if I replace the cert generated by the TemporalClusterClient with the frontend-intermediate certificate secret (in the temporal namespace, just copied over), everything works just fine.
Running an openssl s_client results in a similar story: With the TemporalClusterClient generated certificate:
openssl s_client -connect temporal-cluster-frontend.temporal.svc.cluster.local:7233 -cert tls.crt -key tls.key -CAfile ca.crt
Verify return code: 20 (unable to get local issuer certificate)
With the frontend intermediate:
openssl s_client -connect temporal-cluster-frontend.temporal.svc.cluster.local:7233 -cert tls.crt -key tls.key -CAfile ca.crt
Verify return code: 0 (ok)
Any ideas? I am scratching my head trying to figure out what I might be doing wrong here.
Hi! Which version are you using ?
Hey, I am using version v0.18.0 of the operator.
ghcr.io/alexandrevilain/temporal-operator:v0.18.0
Hi @andrewbelu !
This may be an issue with https://github.com/alexandrevilain/temporal-operator/pull/715. Could you please try with v0.17.0 ?
@alexandrevilain Hello! Tried with v0.17 of the operator and same deal.
Here is the info of the certificate (omitted unnecessary details):
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
d3:b1:80:b7:89:71:af:d7:d8:9c:0b:66:82:77:3c:67
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = Frontend intermediate CA certificate
Validity
Not Before: Jun 4 18:29:09 2024 GMT
Not After : Jun 6 18:29:09 2024 GMT
Subject: CN = example-worker client certificate
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (4096 bit)
Modulus:
...
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Authority Key Identifier:
5E:9F:23:BA:83:22:89:07:79:D4:16:BA:0B:2D:75:35:45:23:C7:91
X509v3 Subject Alternative Name:
DNS:example-worker.temporal-cluster.temporal.svc.cluster.local
Signature Algorithm: sha256WithRSAEncryption
Signature Value:
...
Perhaps it's the SAN? I notice that it's giving a different namespace for the worker than the one the worker pod is actually in, but I am unsure if this is intended or not.
X509v3 Subject Alternative Name:
DNS:example-worker.temporal-cluster.temporal.svc.cluster.local
I should add the original Python error (forgot to copy paste that):
RuntimeError: Failed client connect: Server connection error: tonic::transport::Error(Transport, hyper::Error(Connect, Custom { kind: InvalidData, error: InvalidCertificate(NotValidForName) }))
Hi @andrewbelu !
Sorry for the late reply, I'm trying to reproduce your issue, but it works well on my side.
Here are the steps I followed:
kubectl apply -f examples/cluster-mtls/00-namespace.yaml
kubectl apply -f examples/cluster-mtls/01-postgresql.yaml
kubectl apply -f examples/cluster-mtls/02-temporal-cluster.yaml
# waiting for the cluster to be up and running
kubectl apply -f examples/cluster-mtls/03-temporal-cluster-client.yaml
kubectl cert-manager inspect secret -n demo prod-my-worker-mtls-certificate # using cert-manager kubectl plugin
# exporting certificates
kubectl view-secret prod-my-worker-mtls-certificate -n demo tls.key > /tmp/tls.key
kubectl view-secret prod-my-worker-mtls-certificate -n demo tls.crt > /tmp/tls.crt
kubectl view-secret prod-my-worker-mtls-certificate -n demo ca.crt > /tmp/ca.crt
# exporting SERVER_NAME
export SERVER_NAME=$(kubectl get temporalclusterclient my-worker -o=template="{{.status.serverName}}")
# on another shell:
kubectl port-forward service/prod-frontend -n demo 7233:7233
# then same test:
openssl s_client -connect localhost:7233 -cert /tmp/tls.crt -key /tmp/tls.key -CAfile /tmp/ca.crt -servername $SERVER_NAME
Here is the result I get:
Connecting to ::1
CONNECTED(00000005)
depth=2 CN=Root CA certificate
verify return:1
depth=1 CN=Frontend intermediate CA certificate
verify return:1
depth=0 CN=Frontend Certificate
verify return:1
---
Certificate chain
0 s:CN=Frontend Certificate
i:CN=Frontend intermediate CA certificate
a:PKEY: rsaEncryption, 4096 (bit); sigalg: RSA-SHA256
v:NotBefore: Jun 13 14:20:27 2024 GMT; NotAfter: Jun 13 15:20:27 2024 GMT
1 s:CN=Frontend intermediate CA certificate
i:CN=Root CA certificate
a:PKEY: rsaEncryption, 4096 (bit); sigalg: RSA-SHA256
v:NotBefore: Jun 13 14:20:07 2024 GMT; NotAfter: Jun 13 15:50:07 2024 GMT
---
Server certificate
-----BEGIN CERTIFICATE-----
OMITED
-----END CERTIFICATE-----
subject=CN=Frontend Certificate
issuer=CN=Frontend intermediate CA certificate
---
Acceptable client certificate CA names
CN=Root CA certificate
Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1
Shared Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 3652 bytes and written 2352 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_128_GCM_SHA256
Server public key is 4096 bit
This TLS version forbids renegotiation.
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
Is there something I'm missing to reproduce your issue ?