nodejs-logging-winston icon indicating copy to clipboard operation
nodejs-logging-winston copied to clipboard

Ability to see the reason for `Total timeout of API google.logging.v2.LoggingServiceV2 exceeded...`

Open jbojcic1 opened this issue 3 years ago • 4 comments

Currently if there is something wrong and log cannot be sent, you will just get an error with message Total timeout of API google.logging.v2.LoggingServiceV2 exceeded... but would be nice that this error also contains the original error returned by the endpoint so we can easily see what is the underlying problem.

Currently this happens for me occasionally on my deployed app and I have no way at all to see what is the actual problem

jbojcic1 avatar Aug 24 '22 13:08 jbojcic1

Also not really related to this but it seems like some options that are available like maxRetries are completely ignored

jbojcic1 avatar Aug 24 '22 13:08 jbojcic1

@jbojcic1 , thanks a lot for reaching out! I wonder if the if our default callback feature described here could be useful to analyze the error? As for maxRetries - can you please share a code you use to set it up?

losalex avatar Aug 25 '22 19:08 losalex

@losalex defaultCallback is only called after timeout and it doesn't contain original error. You can see the example here where I formatted latency wrongly in one of the logs so that formatting fails. Just run it and wait for one minute for timeout to happen. What happens is that each attempt to send that log fails and retries are happening until the timeout. Once it timeouts the defaultCallback is executed but it doesn't contain the real reason for the error (which can be seen if you set the breakpoint in the code that handles the retries in google-gax here)

Also you can see in the same example how maxRetries doesn't have any effect. Num of retries in google-gax is still undefined.

Btw for some reason the default callback is not even executed in my replit example if I don't have error handler registered on winston logger like:

logger.on('error', (error) => {
  ...
})

jbojcic1 avatar Aug 28 '22 15:08 jbojcic1

Thanks @jbojcic1 for more info! Indeed, it seems that maxRetries are not passed properly - I will prepare a fix for it. I am working to debug an issue with defaultCallback, but I have seen that it is called properly for log.info calls in sample you shared which does not fail, so it seems we might have some case where it is missed somehow.

losalex avatar Sep 01 '22 21:09 losalex

@jbojcic1 , I believe the latest fix should resolve the issue. Thats said, I do not see that Cloud Logging service returning appropriate status for malformed data, so I will be reaching out to Cloud Logging team to raise this issue and ask them to return appropriate error. As of now, I will be closing this issue and will track a problem with Cloud Logging team separately - please let me know if I can assist you more or you have more questions or concerns.

losalex avatar Oct 20 '22 02:10 losalex

@jbojcic1 , I figured out the proper format for latency - for GRPC, the format should be in Duration which basically should include seconds and nanos. So the following code would work:

logger.info('Fake request 2', {
  httpRequest: {
    requestMethod: 'GET',
    requestUrl: 'http://some-url.com',
    status: 200,
     latency: {
       seconds: 1,
       nanos: 0
     }
  }
});

Thats said, the callback with appropriate error should still be called correctly - I filed an issue which will be solved separately.

losalex avatar Oct 24 '22 21:10 losalex

@jbojcic1 , I figured out the proper format for latency - for GRPC, the format should be in Duration which basically should include seconds and nanos. So the following code would work:

I know that. As a comment in the code sample says, I applied the wrong format on purpose there to demo the issue

jbojcic1 avatar Oct 24 '22 21:10 jbojcic1

@jbojcic1 , I figured out the proper format for latency - for GRPC, the format should be in Duration which basically should include seconds and nanos. So the following code would work:

I know that. As a comment in the code sample says, I applied the wrong format on purpose there to demo the issue

Great - I wanted to mention the solution for other people facing same problem and searching a solution :-)

losalex avatar Oct 25 '22 18:10 losalex

Thats said, I do not see that Cloud Logging service returning appropriate status for malformed data, so I will be reaching out to Cloud Logging team to raise this issue and ask them to return appropriate error

could you then please suggest some other way to force the request failure so I can test if the default callback error on timeout contains original error now?

jbojcic1 avatar Jul 10 '23 08:07 jbojcic1