grpc-node icon indicating copy to clipboard operation
grpc-node copied to clipboard

read ECONNRESET in @grpc/grpc-js but not in grpc package

Open Siddhesh-Swami opened this issue 4 years ago • 14 comments

Description: we were using @grpc/grpc-js package in the Kubernetes cluster with the alpine image, recently we got the chance to test in production. Sparingly we are observing the read ECONNRESET on the client-side with no logs on the server-side. We switched to an older version of @grpc/grpc-js--1.2.4 but still the error was observed.
In one of the microservices, we used grpc package with nestjs. that service never gave read ECONNRESET. so migrated all the microservices to [email protected] package and now we do not face the read ECONNRESET error. The client takes a pretty good amount of time to connect to the server around like 2secs 3secs but no read ECONNRESET error is observed.

Environment:

  • OS name, version and architecture: [e.g. Linux Ubuntu 18.04 amd64 Alpine ]
  • docker image node:14.16.1-alpine
  • Kubernetes istio load balancing
  • Node version 14.16.1 -@grpc/proto-loader: 0.5.6 Earlier package: @grpc/grpc-js New package: [email protected]

please tell any more details to add.

Siddhesh-Swami avatar Dec 22 '21 12:12 Siddhesh-Swami

Any updates please?

Siddhesh-Swami avatar Jan 10 '22 18:01 Siddhesh-Swami

We have a similar issue that seems to appear only when our node.js application is deployed on Kubernetes. Here is our stack:

  • Node 16
  • Docker engine on Kubernetes
  • Calico Networking
  • grpc-js Version 1.3.6

We are getting this error so frequently that it cannot be due to sporadic connectivity issues.

vanthome avatar Feb 23 '22 17:02 vanthome

Any updates? We are having that symptom as well

haimrait avatar Apr 19 '22 15:04 haimrait

we have similar issue as well:

  • node 16.14
  • docker engine on k8s
  • calico networking
  • grpc-js 1.5.7
  • grpc server and client 2 replicas without service mesh

this normally happened after > 10 hours of idle

hanstf avatar Jun 21 '22 15:06 hanstf

something related to keepalive? after adding

      keepalive: {
        keepaliveTimeMs: ms('5m'),
      },

I did not get connection reset for several weeks. The default keepalive options might be different between grpc and @grpc/grpc-js

bangbang93 avatar Jun 22 '22 01:06 bangbang93

Any updates?

railsonluna avatar Sep 28 '22 18:09 railsonluna

@bangbang93 After changing the code, have you faced the issue again?

khanh-le-otsv avatar Oct 10 '22 03:10 khanh-le-otsv

@bangbang93 After changing the code, have you faced the issue again?

get rid of this for several months.

bangbang93 avatar Oct 11 '22 01:10 bangbang93

@bangbang93 After changing the code, have you faced the issue again?

get rid of this for several months.

@bangbang93 we have the same issues, aren't you afraid the performance could suffer after setting this option? 5 minutes sound like a lot.

This is a comment from the source code:

The amount of time to wait for an acknowledgement after sending a ping

Here is a link: https://github.com/grpc/grpc-node/blob/6764dcc79602faee5457243629da520ba08b726f/packages/grpc-js/src/subchannel.ts#L114

Nevertheless I just applied it to our services, let's see how it will play out.

tomaswitek avatar Oct 19 '22 12:10 tomaswitek

@bangbang93 After changing the code, have you faced the issue again?

get rid of this for several months.

@bangbang93 we have the same issues, aren't you afraid the performance could suffer after setting this option? 5 minutes sound like a lot.

This is a comment from the source code:

The amount of time to wait for an acknowledgement after sending a ping

Here is a link:

https://github.com/grpc/grpc-node/blob/6764dcc79602faee5457243629da520ba08b726f/packages/grpc-js/src/subchannel.ts#L114

Nevertheless I just applied it to our services, let's see how it will play out.

keepaliveTimeMs,not keepaliveTimeoutMs, https://github.com/grpc/grpc-node/blob/6764dcc79602faee5457243629da520ba08b726f/packages/grpc-js/src/subchannel.ts#L109-L112

bangbang93 avatar Oct 19 '22 15:10 bangbang93

@bangbang93 sorry I sent a wrong link. I tried both and I still get the message :(, but thx for helping

tomaswitek avatar Oct 19 '22 20:10 tomaswitek

This works for us:

const channelOptions: ChannelOptions = {
  ...channelOptions,
  // Send keepalive pings every 10 seconds, default is 2 hours.
  'grpc.keepalive_time_ms': 10 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

✌️

HofmannZ avatar Jan 23 '23 01:01 HofmannZ

Thank you @HofmannZ . Is that fix reliable for you or just makes the problem less evident?

logidelic avatar Apr 17 '23 14:04 logidelic

Hey @logidelic,

We ended up with the following config for the client:

// See: https://grpc.github.io/grpc/cpp/md_doc_keepalive.html
const channelOptions: ChannelOptions = {
  ...channelOptions,
  // Send keepalive pings every 6 minutes, default is none.
  // Must be more than GRPC_ARG_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS on the server (5 minutes.)
  'grpc.keepalive_time_ms': 6 * 60 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

And the following config for the server:

// See: https://grpc.github.io/grpc/cpp/md_doc_keepalive.html
const channelOptions: ChannelOptions = {
  ...channelOptions,
  // Send keepalive pings every 10 seconds, default is 2 hours.
  'grpc.keepalive_time_ms': 10 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

We've been running it in production for a couple of months, and it works reliably.

HofmannZ avatar Apr 17 '23 14:04 HofmannZ