envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Thrift health checker support

Open JuniorHsu opened this issue 3 years ago • 14 comments

Commit Message: Introduce a thrift health checker so thrift only cluster could reflect the health-ness more correctly.

It will be a custom health checker. It sends a thrift request with configed method_name and expect a success response. The thrift server can respond an exception to cause an immediate health check failure.

An example setting for custom_health_check as a Thrift health checker is shown below:

  custom_health_check:
    name: envoy.health_checkers.thrift
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.health_checkers.thrift.v3.Thrift
      method_name: "ping",
      transport: FRAMED
      protocol: BINARY

Risk Level: low Testing: unit test Docs Changes: docs/root/configuration/upstream/health_checkers/thrift.rst [Optional Fixes #Issue] #22555

JuniorHsu avatar Sep 08 '22 21:09 JuniorHsu

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/). envoyproxy/api-shepherds assignee is @markdroth CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/23043 was opened by JuniorHsu.

see: more, trace.

The basic flow is verified by unit tests. While I worked on testing corner cases, could I have some initial feedback?

cc @zuercher @rgs1 @fishcakez @PeterL328

JuniorHsu avatar Sep 08 '22 21:09 JuniorHsu

Some code decisions: (a) UpstreamRequest is not reusable since HC (Health Checker) creates ClientConnection directly but UpstreamRequest uses ConnectionPool. Therefore we implement a simple thrift client. (b) Extract ThriftHealthChecker::Client for the sake of testing and possible future reuse. (c) Inject ClientFactory so we can split the test to ThriftHealthChecker::Client and mock it to test ThriftHealthChecker (d) Introduce passthrough_decoder_event_handler for DIY propose (e) Now we don't have retry semantic response.

JuniorHsu avatar Sep 08 '22 21:09 JuniorHsu

@markdroth heads up that the reason we don't use not_in to exclude AUTO and TWITTER is a bug for foreign enum which triggers the compile error

bazel-out/k8-dbg/bin/external/envoy_api/envoy/extensions/health_checkers/thrift/v3/thrift.pb.validate.cc:62:17: error: unknown type name 'TransportType'; did you mean 'filters::network::thrift_proxy::v3::TransportType'?
        const std::set<TransportType> _Thrift_Transport_NotInLookup = {
                       ^~~~~~~~~~~~~
                       filters::network::thrift_proxy::v3::TransportType

JuniorHsu avatar Sep 08 '22 21:09 JuniorHsu

Code coverage for source/extensions/health_checkers/thrift is lower than limit of 96.6 (89.3)

more boundary cases will help

JuniorHsu avatar Sep 09 '22 03:09 JuniorHsu

/retest

JuniorHsu avatar Sep 10 '22 03:09 JuniorHsu

Retrying Azure Pipelines: Retried failed jobs in: envoy-presubmit

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/23043#issuecomment-1242608896 was created by @JuniorHsu.

see: more, trace.

With some discussion with @fishcakez, we should use fixed sequence id 0.

JuniorHsu avatar Sep 12 '22 23:09 JuniorHsu

/retest

JuniorHsu avatar Sep 13 '22 16:09 JuniorHsu

Retrying Azure Pipelines: Retried failed jobs in: envoy-presubmit

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/23043#issuecomment-1245664047 was created by @JuniorHsu.

see: more, trace.

@markdroth the implantation review is converging. would you have a chance to review api part? thanks.

JuniorHsu avatar Sep 16 '22 21:09 JuniorHsu

@fishcakez noticed that we don't specify transport and protocol in thrift.proto to be required since protoc-gen doesn't support (validate.rules).enum.required.

Fortunately we throw for AUTO_PROTOCOL and AUTO_TRANSPORT, which is 0 and the default value as well, which means that we accidentally make them required. https://github.com/envoyproxy/envoy/blob/1fabd6decb8de8ec97d9d632c60ffb648a4c917a/api/envoy/extensions/filters/network/thrift_proxy/v3/thrift_proxy.proto#L33

Another feedback from him was that we might able to honor ThriftProtocolOptions when we have a good precedence design, so transport/protocol in thrift.proto could be optional, but we could discuss it when we have a need later.

JuniorHsu avatar Sep 19 '22 23:09 JuniorHsu

@markdroth can you PTAL at the API changes? Assigning @zuercher as code-owner. /assign @zuercher

adisuissa avatar Sep 22 '22 18:09 adisuissa

Thanks Adi. Leave a note that we've verified thrift HC in our environment.

JuniorHsu avatar Sep 22 '22 18:09 JuniorHsu

@envoyproxy/api-shepherds any chance to move this forward? Thanks.

JuniorHsu avatar Sep 26 '22 17:09 JuniorHsu

Hello @htuch This PR has @.rgs1 and @.fishcakez reviewed the api but need one more stamp from @envoyproxy/api-shepherds We conduct several bike-shedding as we can see in the comment so we believe it's good to go. We also verified the the implementation in our environment as well. Could you help to move this forward as we have you several thrift proto reviews? Thank you.

JuniorHsu avatar Sep 27 '22 17:09 JuniorHsu

/retest

JuniorHsu avatar Sep 30 '22 01:09 JuniorHsu

Retrying Azure Pipelines: Check envoy-presubmit isn't fully completed, but will still attempt retrying. Retried failed jobs in: envoy-presubmit

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/23043#issuecomment-1263000826 was created by @JuniorHsu.

see: more, trace.

/retest

JuniorHsu avatar Sep 30 '22 04:09 JuniorHsu

Retrying Azure Pipelines: Retried failed jobs in: envoy-presubmit

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/23043#issuecomment-1263083814 was created by @JuniorHsu.

see: more, trace.

Code coverage for source/common/config is lower than limit of 96.5 (96.0)

This is irrelevant since I didn't touch the code and looks like it happens on top of the tree.

JuniorHsu avatar Sep 30 '22 14:09 JuniorHsu