ECONNRESET errors when trying to send logs
I'm running an NodeJS Function App in Azure, making use of Application Insights to perform application layer logging. This is deploying via Zip Deployment which means the file system is read only and so disk retry is not an option.
I'm seeing a high frequency of errors being logged with the message:
'Ingestion endpoint could not be reached. This batch of telemetry items has been lost. Use Disk Retry Caching to enable resending of failed telemetry. Error:',
"[object Error]{ stack: 'Error: read ECONNRESET\n" +
" at TLSWrap.onStreamRead (internal/stream_base_commons.js:209:20)', message: 'read ECONNRESET', name: 'Error'"
Some investigation has lead me to believe this error is likely to be caused by the upstream Application Insights server closing the client connection after some period of time (I've not done enough debugging to see exactly how long it's taking to see these errors).
I have tried turning off the keep-alive agent using the APPLICATION_INSIGHTS_NO_HTTP_AGENT_KEEP_ALIVE environment variable but this resulted in ETIMEDOUT (connection timeout) errors.
I have also tried supplying my own Agent to the application insights backend to attempt to turn off keep-alive, but this has also not helped.
This error has been raised with Azure support to try to get some insight to why we would be seeing connections being reset, but they have not been able to shed any light on the problem. Instead, their advice is to attempt to handle these errors gracefully and retry the requests.
- Would it be acceptable to add some retry logic to requests even if disk retries are off?
- Do you have any thoughts or advice on how to mitigate these errors?
Other possibly related issues: #377 #378
Environment details
Host: Azure Functions (Windows host) Node JS version: 14.x (14.18.1) SDK version: 2.1.9
I have also been directed towards the "standard" retry logic that is implemented in other Azure SDKs (https://github.com/Azure/azure-sdk-for-js/blob/dbd1a3a2e88ce43e650bfaed347898f8e2e77d69/sdk/core/core-http/src/policies/systemErrorRetryPolicy.ts#L101) that show how different errors are automatically handled and retried.
A similar type of approach may be appropriate in this library.