Ability to see the reason for `Total timeout of API google.logging.v2.LoggingServiceV2 exceeded...`
Currently if there is something wrong and log cannot be sent, you will just get an error with message Total timeout of API google.logging.v2.LoggingServiceV2 exceeded... but would be nice that this error also contains the original error returned by the endpoint so we can easily see what is the underlying problem.
Currently this happens for me occasionally on my deployed app and I have no way at all to see what is the actual problem
Also not really related to this but it seems like some options that are available like maxRetries are completely ignored
@jbojcic1 , thanks a lot for reaching out!
I wonder if the if our default callback feature described here could be useful to analyze the error?
As for maxRetries - can you please share a code you use to set it up?
@losalex defaultCallback is only called after timeout and it doesn't contain original error. You can see the example here where I formatted latency wrongly in one of the logs so that formatting fails. Just run it and wait for one minute for timeout to happen. What happens is that each attempt to send that log fails and retries are happening until the timeout. Once it timeouts the defaultCallback is executed but it doesn't contain the real reason for the error (which can be seen if you set the breakpoint in the code that handles the retries in google-gax here)
Also you can see in the same example how maxRetries doesn't have any effect. Num of retries in google-gax is still undefined.
Btw for some reason the default callback is not even executed in my replit example if I don't have error handler registered on winston logger like:
logger.on('error', (error) => {
...
})
Thanks @jbojcic1 for more info!
Indeed, it seems that maxRetries are not passed properly - I will prepare a fix for it.
I am working to debug an issue with defaultCallback, but I have seen that it is called properly for log.info calls in sample you shared which does not fail, so it seems we might have some case where it is missed somehow.
@jbojcic1 , I believe the latest fix should resolve the issue. Thats said, I do not see that Cloud Logging service returning appropriate status for malformed data, so I will be reaching out to Cloud Logging team to raise this issue and ask them to return appropriate error. As of now, I will be closing this issue and will track a problem with Cloud Logging team separately - please let me know if I can assist you more or you have more questions or concerns.
@jbojcic1 , I figured out the proper format for latency - for GRPC, the format should be in Duration which basically should include seconds and nanos. So the following code would work:
logger.info('Fake request 2', {
httpRequest: {
requestMethod: 'GET',
requestUrl: 'http://some-url.com',
status: 200,
latency: {
seconds: 1,
nanos: 0
}
}
});
Thats said, the callback with appropriate error should still be called correctly - I filed an issue which will be solved separately.
@jbojcic1 , I figured out the proper format for latency - for GRPC, the format should be in Duration which basically should include
secondsandnanos. So the following code would work:
I know that. As a comment in the code sample says, I applied the wrong format on purpose there to demo the issue
@jbojcic1 , I figured out the proper format for latency - for GRPC, the format should be in Duration which basically should include
secondsandnanos. So the following code would work:I know that. As a comment in the code sample says, I applied the wrong format on purpose there to demo the issue
Great - I wanted to mention the solution for other people facing same problem and searching a solution :-)
Thats said, I do not see that Cloud Logging service returning appropriate status for malformed data, so I will be reaching out to Cloud Logging team to raise this issue and ask them to return appropriate error
could you then please suggest some other way to force the request failure so I can test if the default callback error on timeout contains original error now?