go-control-plane icon indicating copy to clipboard operation
go-control-plane copied to clipboard

`SecretDiscoveryServiceServer`: `StreamSecrets` issues

Open mbana opened this issue 3 years ago • 0 comments

Issues

I've noticed a couple of things.

  1. When we implement SecretDiscoveryServiceServer "github.com/envoyproxy/go-control-plane/envoy/service/secret/v3" sometimes the StreamSecrets does not get called back at all.
  2. when StreamSecrets is called, it is called endlessly with a new stream each time, see the log output further down in this message.

When StreamSecrets is not called, obviously, our dynamically supplied secrets are titled dynamic_warming_secrets instead of dynamic_active_secrets.

Code

Our code is public so, here are the definitions of:

  1. StreamSecrets: https://github.com/kubeshop/kusk-gateway/blob/mbana-oauth-issue-401-sds/internal/envoy/sds/sds.go#L72
  2. Where we register the SecretDiscoveryServiceServer: https://github.com/kubeshop/kusk-gateway/blob/mbana-oauth-issue-401-sds/internal/envoy/manager/envoy_config_manager.go#L183
  3. Configuring the cache etc: https://github.com/kubeshop/kusk-gateway/blob/mbana-oauth-issue-401-sds/internal/envoy/manager/envoy_config_manager.go#L56.

Log of StreamSecrets being called multiple times (the stream=&{0xc000f19c70} is different each time):

2022-08-12T09:19:30Z | sds.go:74: SecretDiscoveryServiceServer.StreamSecrets: exiting method
2022-08-12T09:19:30Z | sds.go:92: 
2022-08-12T09:19:30Z | sds.go:93: SecretDiscoveryServiceServer.StreamSecrets: calling stream.Recv - stream=&{0xc000f19380}, len(s.ClientSecrets)=2
2022-08-12T09:19:30Z | sds.go:116: SecretDiscoveryServiceServer.StreamSecrets: request.TypeUrl=type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret, len(s.ClientSecrets)=2
2022-08-12T09:19:30Z | sds.go:153: SecretDiscoveryServiceServer.StreamSecrets: stream.Send(response) sent - responses=[<*>version_info:"2022-08-12T09:19:30Z"  resources:{[type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret]:{name:"hmac_secret"  generic_secret:{secret:{inline_bytes:"f9eckuGEcUNxAqKT0uK8OyM2Se01ukVLPHsiSoTh2X8="}}}}  type_url:"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret" <*>version_info:"2022-08-12T09:19:30Z"  resources:{[type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret]:{name:"client_secret"  generic_secret:{secret:{inline_string:"Z6MX7NreJumWLmf6unsQ5uiEUrTBxfNtqG9Vy5Kjktnvfj-_fRCBO9EU1mL1YzAJ"}}}}  type_url:"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret"]
2022-08-12T09:19:30Z | sds.go:154: 
2022-08-12T09:19:30Z | sds.go:74: SecretDiscoveryServiceServer.StreamSecrets: exiting method
2022-08-12T09:19:30Z | sds.go:92: 
2022-08-12T09:19:30Z | sds.go:93: SecretDiscoveryServiceServer.StreamSecrets: calling stream.Recv - stream=&{0xc000f19c70}, len(s.ClientSecrets)=2
2022-08-12T09:19:30Z | sds.go:116: SecretDiscoveryServiceServer.StreamSecrets: request.TypeUrl=type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret, len(s.ClientSecrets)=2
2022-08-12T09:19:30Z | sds.go:153: SecretDiscoveryServiceServer.StreamSecrets: stream.Send(response) sent - responses=[<*>version_info:"2022-08-12T09:19:30Z"  resources:{[type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret]:{name:"client_secret"  generic_secret:{secret:{inline_string:"Z6MX7NreJumWLmf6unsQ5uiEUrTBxfNtqG9Vy5Kjktnvfj-_fRCBO9EU1mL1YzAJ"}}}}  type_url:"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret" <*>version_info:"2022-08-12T09:19:30Z"  resources:{[type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret]:{name:"hmac_secret"  generic_secret:{secret:{inline_bytes:"f9eckuGEcUNxAqKT0uK8OyM2Se01ukVLPHsiSoTh2X8="}}}}  type_url:"type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret"]

What we expect when we look at /config_dump of when StreamSecrets is being called:

{
   "@type": "type.googleapis.com/envoy.admin.v3.SecretsConfigDump",
   "dynamic_active_secrets": [
    {
     "name": "client_secret",
     "version_info": "2022-08-12T09:22:18Z",
     "last_updated": "2022-08-12T09:22:18.140Z",
     "secret": {
      "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret",
      "name": "client_secret",
      "generic_secret": {
       "secret": {
        "inline_string": "[redacted]"
       }
      }
     }
    },
    {
     "name": "hmac_secret",
     "version_info": "2022-08-12T09:22:18Z",
     "last_updated": "2022-08-12T09:22:18.386Z",
     "secret": {
      "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret",
      "name": "hmac_secret",
      "generic_secret": {
       "secret": {
        "inline_bytes": "W3JlZGFjdGVkXQ=="
       }
      }
     }
    }
   ]
  }

Don't worry about the exposed the secrets, they'll be removed soon.

Logs or Debug Information

In addition to that, here are dumps taken from the admin endpoint (/config_dump) of when it is working and when it is not working, i.e., when the StreamSecrets gRPC method is not invoked on the go-control-plane and when it is. Notice how the secrets are called dynamic_warming_secrets.

  1. config_dump-broken.json: https://gist.github.com/mbana/61305292ddb9fd83e260a0125893f6ca
  2. logs-broken.log: https://gist.github.com/mbana/a4636cf5e96a035db618f7c37b8dc275
  3. config_dump-working.json: https://gist.github.com/mbana/e9b7b29ed7c1be032aca867304d86d60
  4. logs-working.log: https://gist.github.com/mbana/2881bc1873090d8d82649ba95625d789

If there's anything further I can do to help get to the cause of this issue, please let me know.


Envoy Compiled From 7af7608b022c5e3ae2ad110b4d94aa7506a643d7

Edit: I went a step further and compiled Envoy from source and made a Docker image of it:

Dockerfile

FROM docker.io/ubuntu:22.04

COPY envoy-static /usr/local/bin/envoy
COPY envoy-static /usr/bin/envoy

ENTRYPOINT ["/usr/local/bin/envoy"]

Build Steps

$ git remote -vv    
origin	[email protected]:envoyproxy/envoy.git (fetch)
origin	[email protected]:envoyproxy/envoy.git (push)
$ git rev-parse HEAD
7af7608b022c5e3ae2ad110b4d94aa7506a643d7
$ bazel/setup_clang.sh
$ echo "build --config=clang" >> user.bazelrc
$ echo "build --copt=-fno-limit-debug-info" >> user.bazelrc
$ bazel build --jobs=32 -c fastbuild envoy
$ cp bazel-bin/source/exe/envoy-static .
$ docker build --tag ttl.sh/kubeshop/envoy:24h --file ./Dockerfile .
$ docker push ttl.sh/kubeshop/envoy:24h

The image is available at ttl.sh/kubeshop/envoy:24h (docker run --rm -it ttl.sh/kubeshop/envoy:24h). Note: This image is only available for ~24 hours from the time of editing this post (2022-08-12T13:47:31+00:00 UTC).

Segmentation Fault

I noticed that Envoy crashes:

[2022-08-12 13:35:16.253][7][critical][assert] [source/common/init/manager_impl.cc:36] assert failure: false. Details: attempted to add shared target SdsApi client_secret to initialized init manager Server
...
<STACK_TRACE_OMITTED>
...
Our FatalActions triggered a fatal signal.
Segmentation fault (core dumped)

Can anyone see anything useful in this stack-trace? The assertion failing is particular interesting, but I don't know much about the Envoy code-base to tell if this is an issue or not:

[2022-08-12 13:35:16.253][7][critical][assert] [source/common/init/manager_impl.cc:36] assert failure: false. Details: attempted to add shared target SdsApi client_secret to initialized init manager Server

mbana avatar Aug 12 '22 10:08 mbana