read ECONNRESET in @grpc/grpc-js but not in grpc package
Description:
we were using @grpc/grpc-js package in the Kubernetes cluster with the alpine image, recently we got the chance to test in production. Sparingly we are observing the read ECONNRESET on the client-side with no logs on the server-side. We switched to an older version of @grpc/grpc-js--1.2.4 but still the error was observed.
In one of the microservices, we used grpc package with nestjs. that service never gave read ECONNRESET. so migrated all the microservices to [email protected] package and now we do not face the read ECONNRESET error. The client takes a pretty good amount of time to connect to the server around like 2secs 3secs but no read ECONNRESET error is observed.
Environment:
- OS name, version and architecture: [e.g. Linux Ubuntu 18.04 amd64 Alpine ]
- docker image node:14.16.1-alpine
- Kubernetes istio load balancing
- Node version 14.16.1 -@grpc/proto-loader: 0.5.6 Earlier package: @grpc/grpc-js New package: [email protected]
please tell any more details to add.
Any updates please?
We have a similar issue that seems to appear only when our node.js application is deployed on Kubernetes. Here is our stack:
- Node 16
- Docker engine on Kubernetes
- Calico Networking
- grpc-js Version 1.3.6
We are getting this error so frequently that it cannot be due to sporadic connectivity issues.
Any updates? We are having that symptom as well
we have similar issue as well:
- node 16.14
- docker engine on k8s
- calico networking
- grpc-js 1.5.7
- grpc server and client 2 replicas without service mesh
this normally happened after > 10 hours of idle
something related to keepalive? after adding
keepalive: {
keepaliveTimeMs: ms('5m'),
},
I did not get connection reset for several weeks.
The default keepalive options might be different between grpc and @grpc/grpc-js
Any updates?
@bangbang93 After changing the code, have you faced the issue again?
@bangbang93 After changing the code, have you faced the issue again?
get rid of this for several months.
@bangbang93 After changing the code, have you faced the issue again?
get rid of this for several months.
@bangbang93 we have the same issues, aren't you afraid the performance could suffer after setting this option? 5 minutes sound like a lot.
This is a comment from the source code:
The amount of time to wait for an acknowledgement after sending a ping
Here is a link: https://github.com/grpc/grpc-node/blob/6764dcc79602faee5457243629da520ba08b726f/packages/grpc-js/src/subchannel.ts#L114
Nevertheless I just applied it to our services, let's see how it will play out.
@bangbang93 After changing the code, have you faced the issue again?
get rid of this for several months.
@bangbang93 we have the same issues, aren't you afraid the performance could suffer after setting this option? 5 minutes sound like a lot.
This is a comment from the source code:
The amount of time to wait for an acknowledgement after sending a pingHere is a link:
https://github.com/grpc/grpc-node/blob/6764dcc79602faee5457243629da520ba08b726f/packages/grpc-js/src/subchannel.ts#L114
Nevertheless I just applied it to our services, let's see how it will play out.
keepaliveTimeMs,not keepaliveTimeoutMs,
https://github.com/grpc/grpc-node/blob/6764dcc79602faee5457243629da520ba08b726f/packages/grpc-js/src/subchannel.ts#L109-L112
@bangbang93 sorry I sent a wrong link. I tried both and I still get the message :(, but thx for helping
This works for us:
const channelOptions: ChannelOptions = {
...channelOptions,
// Send keepalive pings every 10 seconds, default is 2 hours.
'grpc.keepalive_time_ms': 10 * 1000,
// Keepalive ping timeout after 5 seconds, default is 20 seconds.
'grpc.keepalive_timeout_ms': 5 * 1000,
// Allow keepalive pings when there are no gRPC calls.
'grpc.keepalive_permit_without_calls': 1,
};
✌️
Thank you @HofmannZ . Is that fix reliable for you or just makes the problem less evident?
Hey @logidelic,
We ended up with the following config for the client:
// See: https://grpc.github.io/grpc/cpp/md_doc_keepalive.html
const channelOptions: ChannelOptions = {
...channelOptions,
// Send keepalive pings every 6 minutes, default is none.
// Must be more than GRPC_ARG_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS on the server (5 minutes.)
'grpc.keepalive_time_ms': 6 * 60 * 1000,
// Keepalive ping timeout after 5 seconds, default is 20 seconds.
'grpc.keepalive_timeout_ms': 5 * 1000,
// Allow keepalive pings when there are no gRPC calls.
'grpc.keepalive_permit_without_calls': 1,
};
And the following config for the server:
// See: https://grpc.github.io/grpc/cpp/md_doc_keepalive.html
const channelOptions: ChannelOptions = {
...channelOptions,
// Send keepalive pings every 10 seconds, default is 2 hours.
'grpc.keepalive_time_ms': 10 * 1000,
// Keepalive ping timeout after 5 seconds, default is 20 seconds.
'grpc.keepalive_timeout_ms': 5 * 1000,
// Allow keepalive pings when there are no gRPC calls.
'grpc.keepalive_permit_without_calls': 1,
};
We've been running it in production for a couple of months, and it works reliably.