envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Is there some way to figure out why Envoy can't send a request to upstream?

Open YvesZHI opened this issue 1 year ago • 0 comments

I deployed an Envoy as the gateway of my backend services. There are two kinds of protocols HTTP1.1 and gRPC on one port. Here is the config of the Envoy:

  - name: listener_portal_https
    address:
    listener_filters:
    - name: envoy.filters.listener.tls_inspector
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
    - name: envoy.filters.listener.http_inspector
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.listener.http_inspector.v3.HttpInspector
    filter_chains:
    - filter_chain_match:        # HTTP1.1
        transport_protocol: tls
        server_names:
        - "www.example.com"
      filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          ...
    - filter_chain_match:        # gRPC
        transport_protocol: raw_buffer
        application_protocols:
        - "h2c"
      filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          ...

As you see, I set two filter_chain_matchs on one same port. One is for HTTP1.1 and the oher is for gRPC.

However, I just found that the gRPC requests have a low probability to fail to send to the upstream. When this occured, the gRPC requests can not be sent to the upstream for about several minutes.

I've thought it was because of poor network. However, when this occured, I tried to send some HTTP1.1 requests and they worked without any error... And rebooting Envoy can solve this immediately.

For example, any gRPC requests can't be sent to the upstream from 09:00 to 09:10, but at the same time, any HTTP1.1 request can be sent to the upstream. And if I reboot Envoy, this issue will be gone immediately.

gRPC-client ---> Envoy(request lost here!) ---> upstream

As the timeout of gRPC is set to 0, I can see that the client keeps waiting for the response. After cancelling the request, the Envoy showed me the log, whose %PROTOCOL% is HTTP/2, %RESPONSE_CODE% is 0 and %DURATION% is a very large number.

In a word, it seems that the gRPC request kind of blocks in Envoy.

I want to know why. So I set debug-level log:

{"log":"[2024-02-26 15:06:14.637][15][debug][router] [source/common/router/router.cc:732] [Tags: "ConnectionId":"39751","StreamId":"3374533070535275650"] router decoding headers:\n","stream":"stderr","time":"2024-02-26T07:06:14.638036637Z"} {"log":"':scheme', 'http'\n","stream":"stderr","time":"2024-02-26T07:06:14.63804108Z"} {"log":"':path', '/com.example.api/testApi'\n","stream":"stderr","time":"2024-02-26T07:06:14.638044647Z"} {"log":"':method', 'POST'\n","stream":"stderr","time":"2024-02-26T07:06:14.638047823Z"} {"log":"':authority', 'api.example.com:3455'\n","stream":"stderr","time":"2024-02-26T07:06:14.638050754Z"} {"log":"'t2m-7uz4re', 'z1x0c9v8b7n6m5l4k3j2h'\n","stream":"stderr","time":"2024-02-26T07:06:14.638053909Z"} {"log":"'grpc-accept-encoding', 'identity,deflate,gzip'\n","stream":"stderr","time":"2024-02-26T07:06:14.63805677Z"} {"log":"'accept-encoding', 'identity'\n","stream":"stderr","time":"2024-02-26T07:06:14.638060343Z"} {"log":"'user-agent', 'grpc-node-js/1.8.10'\n","stream":"stderr","time":"2024-02-26T07:06:14.638063616Z"} {"log":"'content-type', 'application/grpc'\n","stream":"stderr","time":"2024-02-26T07:06:14.638066541Z"} {"log":"'te', 'trailers'\n","stream":"stderr","time":"2024-02-26T07:06:14.638069575Z"} {"log":"'x-forwarded-for', '111.222.111.111'\n","stream":"stderr","time":"2024-02-26T07:06:14.638072629Z"} {"log":"'x-forwarded-proto', 'http'\n","stream":"stderr","time":"2024-02-26T07:06:14.638076176Z"} {"log":"'x-envoy-external-address', '111.222.111.111'\n","stream":"stderr","time":"2024-02-26T07:06:14.638079767Z"} {"log":"'x-request-id', '9fa34df2-e5bc-9217-b9cx-56232f571193'\n","stream":"stderr","time":"2024-02-26T07:06:14.638082773Z"} {"log":"\n","stream":"stderr","time":"2024-02-26T07:06:14.638086053Z"} {"log":"[2024-02-26 15:06:14.637][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:265] [Tags: "ConnectionId":"39752"] using existing fully connected connection\n","stream":"stderr","time":"2024-02-26T07:06:14.638088973Z"} {"log":"[2024-02-26 15:06:14.637][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:182] [Tags: "ConnectionId":"39752"] creating stream\n","stream":"stderr","time":"2024-02-26T07:06:14.638093546Z"} {"log":"[2024-02-26 15:06:14.637][15][debug][router] [source/common/router/upstream_request.cc:563] [Tags: "ConnectionId":"39751","StreamId":"3374533070535275650"] pool ready\n","stream":"stderr","time":"2024-02-26T07:06:14.638097661Z"} {"log":"[2024-02-26 15:06:14.637][15][debug][http] [source/common/http/conn_manager_impl.cc:1177] [Tags: "ConnectionId":"39751","StreamId":"3374533070535275650"] request end stream\n","stream":"stderr","time":"2024-02-26T07:06:14.638101248Z"} {"log":"[2024-02-26 15:06:14.637][15][debug][client] [source/common/http/codec_client.cc:141] [Tags: "ConnectionId":"39752"] encode complete\n","stream":"stderr","time":"2024-02-26T07:06:14.638104657Z"}

It seems that there isn't any error log...

This issue has occured 3 times since last month. I want to know if there is some other way to figure this issue out.

YvesZHI avatar Feb 26 '24 11:02 YvesZHI