pub-sub-api-node-client icon indicating copy to clipboard operation
pub-sub-api-node-client copied to clipboard

Receiving errors with empty cause field and then client closes

Open ninabernick opened this issue 1 year ago • 3 comments

We received a high volume of events in one of our pubsub subscriptions, and at the same time saw errors with cause = {}. We saw that we'd receive an error with empty cause, then one minute later we receive an 'end' message, which shouldn't happen as we have an infinite description. Have you seen this happen before or do you know why we see this delay? We are unsure what the cause of the errors is and would like to request better error reporting. Thanks! Here is an example error: {"cause":{},"replayId":296772,"event":{"event":{"id":"5d5afe-ccbe-4364-8c0a-b622138c5bf2","schemaId":"xapjTc686Y60-M-QS9Dw","payload":{"type":"Buffer","data":[168,171,240,.......more data........,111,110,101,2,0]}},"replayId":{"type":"Buffer","data":[0,0,0,0,0,4,135,68,0,0]}},"latestReplayId":296781}

ninabernick avatar Feb 05 '25 01:02 ninabernick

Hi @ninabernick, do you have more details that you could share? Is there an error message along with this stack trace?

There was a similar issue with an older version of the client: it was failing "silently" without a cause when parsing a message on an refreshed schema but I fixed it since. Can you confirm that you're on the latest version?

pozil avatar Feb 07 '25 16:02 pozil

We're on version 5.2.1. but can upgrade to 5.2.2 if that would help. Unfortunately the message field is not present in the data we received in subscribeCallback -- the above is all the data we got back. The root cause on the salesforce side was a huge backfill that was being run, with some grpc requests having 8 RESOURCE_EXHAUSTED: The service received too many connections and doesn't have the resources to accept new connections. rpcId: 5a2af035-6af2-45d9-b1ed-e5bab2b4d555

ninabernick avatar Feb 07 '25 21:02 ninabernick

Thanks for the extra details. The bug that I mentioned was older so you're good with 5.2.1. You won't need to upgrade to 5.2.2 if you're not using TypeScript.

To be honest I don't really have a good lead as to what would cause this. I wonder if the message was malformed because of the service interruption. Have you tried to replay the event that is causing this since you have its replay ID?

BTW I recommend to avoid infinite mode when resuming after a backfill. You should make sure that you introduce some "breaks" when catching up on messages so that you don't consume all resources at once. You should pull messages in smaller batches in this context to avoid overflow.

pozil avatar Feb 10 '25 08:02 pozil