serving icon indicating copy to clipboard operation
serving copied to clipboard

Using Istio mTLS with Knative Serving

Open JonKusz opened this issue 3 years ago • 7 comments

Ask your question here:

I have a question regarding using Istio mTLS with Knative Serving.

After enabling mTLS via the Istio operator, and configuring the AuthorizationPolicy outlined in the official documentation here (adapted for my application of course), everything seems to function correctly, however we see the following autoscaler log.

{"severity":"ERROR","timestamp":"2022-06-29T20:06:59.369852767Z","logger":"autoscaler.collector","caller":"metrics/collector.go:319","message":"Failed to scrape metrics","commit":"3f603cd","knative.dev/key":"test-app/test-app-1-00002","knative.dev/key":"test-app/test-app-1-00002","error":"unsuccessful scrape, sampleSize=1: Get \"http://test-app-1-00002-private.test-app:9090/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)","stacktrace":"knative.dev/serving/pkg/autoscaler/metrics.newCollection.func3\n\tknative.dev/serving/pkg/autoscaler/metrics/collector.go:319"}
{"severity":"INFO","timestamp":"2022-06-29T20:07:00.377443236Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:316","message":"Failed scraping pod 10.64.138.5","commit":"3f603cd","knative.dev/key":"test-app/test-app-1-00002","error":"GET request for URL \"http://10.64.138.5:9090/metrics\" returned HTTP status 503"}
{"severity":"WARNING","timestamp":"2022-06-29T20:07:00.377491002Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:340","message":"0 pods were successfully scraped out of 1","commit":"3f603cd","knative.dev/key":"test-app/test-app-1-00002"}
{"severity":"INFO","timestamp":"2022-06-29T20:07:00.38344177Z","logger":"autoscaler","caller":"metrics/stats_scraper.go:235","message":"Direct pod scraping off, service scraping, on","commit":"3f603cd","knative.dev/key":"test-app/test-app-1-00002"}

There are some Github issues with similar issues, and reading them it seems to hint that this "fallback to service scraping" is an expected result of having Istio mTLS enabled? I am looking for some sort of confirmation that this is the case.

References https://github.com/knative/serving/issues/11877 https://github.com/knative/networking/pull/494 https://github.com/julz/serving/blob/master/pkg/autoscaler/metrics/stats_scraper.go#L225-L226

JonKusz avatar Jul 07 '22 14:07 JonKusz

cc @psschwei @nader-ziada

dprotaso avatar Jul 15 '22 14:07 dprotaso

if istio mesh mode is enabled, then autoscaler uses service scraping

nader-ziada avatar Jul 15 '22 14:07 nader-ziada

Related issue: https://github.com/knative/serving/issues/10751

Also there are few settings you can tweak since you're running mesh mode. One will skip scraping of pods - assuming it won't be successful in mesh mode

https://github.com/knative/networking/blob/30b5dfe69be6887ef9102ff426bc245b8b80a5f7/config/config-network.yaml#L150-L168

Looks like it's not documented on the knative.dev website

dprotaso avatar Jul 15 '22 19:07 dprotaso

@JonKusz : In my case, i was a missing a DestinationRule to allow data plane (ksvc) to be reachable from knative-serving namespace.

rachitchauhan43 avatar Jul 29 '22 19:07 rachitchauhan43

and Knative will attempt pod scraping, but fall back to service scraping, which generates the logs I am seeing. Is that correct?

Yup

dprotaso avatar Aug 08 '22 21:08 dprotaso

@dprotaso Thanks for the confirmation. I decided to set mesh-compatibility-mode to enabled which removed the errors, except for some intermittent Failed to scrape metrics like the following.

{"severity":"ERROR","timestamp":"2022-06-29T20:06:59.369852767Z","logger":"autoscaler.collector","caller":"metrics/collector.go:319","message":"Failed to scrape metrics","commit":"3f603cd","knative.dev/key":"test-app/test-app-1-00002","knative.dev/key":"test-app/test-app-1-00002","error":"unsuccessful scrape, sampleSize=1: Get \"http://test-app-1-00002-private.test-app:9090/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)","stacktrace":"knative.dev/serving/pkg/autoscaler/metrics.newCollection.func3\n\tknative.dev/serving/pkg/autoscaler/metrics/collector.go:319"}

I believe metrics are collected every second, and I see this a couple times per minute, which makes me think it might be benign. Thoughts?

JonKusz avatar Aug 09 '22 19:08 JonKusz

@dprotaso Just following up on this. Let me know if this might be a bug or I'm missing something.

JonKusz avatar Sep 07 '22 14:09 JonKusz

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Dec 07 '22 01:12 github-actions[bot]

Just following up on the topic here. Is it possible to have mTLS without Istio with an Ingress with Queue Proxy as Sidecar? Is it possible to use mTLS with Contour Ingress (Backed by Envoy) as well?

amarflybot avatar Mar 21 '23 16:03 amarflybot

@dprotaso Just following up on this. Let me know if this might be a bug or I'm missing something.

@JonKusz we would still be scraping the Revision's ClusterIP - so I would expect metric collection to continue.

dprotaso avatar Mar 22 '23 19:03 dprotaso

Just following up on the topic here. Is it possible to have mTLS without Istio with an Ingress with Queue Proxy as Sidecar? Is it possible to use mTLS with Contour Ingress (Backed by Envoy) as well?

Not at the moment we are trying to sort out just regular internal TLS.

dprotaso avatar Mar 22 '23 19:03 dprotaso

@dprotaso Question: Do you think it is possible to use qpOptions to have a custom m-TLS based on a custom certificate service (let's say we get certificates from an external service using HTTP calls and keep them in cache on the pod)?

amarflybot avatar Mar 24 '23 05:03 amarflybot

Hm it seems this could be doable in qpOptions. The question just remains, why would you want to do that? Wouldn't it be easier to use something like a ServiceMesh for transparent mTLS and additional features like Authentication/Authorization?

ReToCode avatar Mar 24 '23 06:03 ReToCode

Thanks for the reply,@retocode. So basically, I cannot use service mesh because of fips and compliance issues (another annoyance). I have an existing setup that uses a sidecar for m-tls between a client and server pods.

Existing setup: ClientContainer-->mTLSSidecar ---interClusterNetwork--> mTLSSidecar-->ServerContainer

Expectation With KNative setup: ClientContainer-->mTLSSidecar ---interClusterNetwork--> mTLSSidecar-->queueProxySidecar-->ServerContainer

What I don't understand is how could we have two sideCar proxies in the consecutive flow. Is this possible?

amarflybot avatar Mar 26 '23 04:03 amarflybot

Yes, in theory. This

ClientContainer-->mTLSSidecar ---interClusterNetwork--> mTLSSidecar-->queueProxySidecar-->ServerContainer

is exactly what is done when using istio (mTLSSidecar is the istio-proxy in that case). But you probably would need to change a lot of stuff to achieve this or implement something like istio sidecar injection yourself to get there.

What I don't understand is how could we have two sideCar proxies in the consecutive flow. Is this possible?

Yes it is possible, but a bit tricky with regards to traffic routing and ordering of the sidecars.

ReToCode avatar Mar 27 '23 07:03 ReToCode

I cannot use service mesh because of fips and compliance issues

Can you elaborate on this? Why don't service meshes help in this case?

dprotaso avatar Mar 27 '23 18:03 dprotaso

Can you elaborate on this? Why don't service meshes help in this case?

There are two service mesh for consideration. first, Istio: Has its own CA (Certificate Authority, we are not allowed to use any other Certificates, except for Node Certificate or Our own CA which can be linked using cert-manager) and other one is Linkerd: Which is not fips complaint.

So the best option for me is to use another mtls-proxy-sidecar and make changes in the mtls-proxy-sidecar to forward the port to the incoming request.

amarflybot avatar Mar 28 '23 07:03 amarflybot

Yes it is possible, but a bit tricky with regards to traffic routing and ordering of the sidecars.

I got the setup done. By default, there is no TLS setup for envoy (Contour) and it blocks TLS. Now the request from the client with mtls enabled sidecar reaches the Envoy and dies. The envoy rejects TLS. Is there a way to make envoy passthrough for TLS? URL: https://projectcontour.io/docs/v1.18.1/config/tls-termination/ Section: TLS Session Passthrough

Does setting this header from Client end work? https://github.com/knative/client/blob/4df601027bea5179061434c496b8125bdf4c443d/vendor/knative.dev/networking/pkg/http/header/header.go#L75

amarflybot avatar Mar 28 '23 13:03 amarflybot

Looks like Istio let's you use bring your own CA - https://istio.io/latest/docs/tasks/security/cert-management/plugin-ca-cert/

dprotaso avatar Mar 28 '23 18:03 dprotaso

Looks like Istio let's you use bring your own CA - https://istio.io/latest/docs/tasks/security/cert-management/plugin-ca-cert/

Thanks for the suggestion, but let's say in here we cannot go with Istio. What is the configuration to make contour envoy pass through tls.

amarflybot avatar Mar 29 '23 01:03 amarflybot

What is the configuration to make contour envoy pass through tls.

This isn't possible - and if it were I don't think Knative would work - as we expect the proxies to manipulate the request (ie. modify headers etc)

dprotaso avatar Mar 29 '23 01:03 dprotaso

This isn't possible - and if it were I don't think Knative would work - as we expect the proxies to manipulate the request (ie. modify headers etc)

What if we want to change the HttpProxy:

# httpproxy-tls-passthrough.yaml
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: example
  namespace: default
spec:
  virtualhost:
    fqdn: tcp.example.com
    tls:
      passthrough: true
  tcpproxy:
    services:
    - name: tcpservice
      port: 8080
    - name: otherservice
      port: 9999
      weight: 20

amarflybot avatar Mar 29 '23 06:03 amarflybot

You can't manipulate request headers when doing passthrough. We have an internal data plane contract between components to support features like activation.

I would pursue bringing your own CA to Istio. Otherwise this is a scenario where Knative is not a fit.

dprotaso avatar Mar 29 '23 14:03 dprotaso

@dprotaso After working with envoy for two days now, I do understand why you are telling to use istio.  :)  

amarflybot avatar Mar 30 '23 14:03 amarflybot

@JonKusz : Were you able to solve this issue ?

rachitchauhan43 avatar Sep 15 '23 20:09 rachitchauhan43