Richard Theis
Richard Theis
@prameshj I was unable to collect any valuable logs at the time of the latest failure. I assume there is some type of lock contention that causes the pod to...
@prameshj That is correct. We've updated the termination grace period to 900 seconds and still see this problem. Although, the failure rate has been much lower than it has been...
We hit the problem again on Kubernetes version 1.20 with NodeLocal DNS version 1.17.3. Unfortunately, I don't have any additional debug data to provide.
We hit this problem again on Kubernetes version 1.22 with NodeLocal DNS version 1.21.1. We are collecting debug data now to determine if we can find the root cause.
We hit the problem again Kubernetes version 1.22 with NodeLocal DNS version 1.21.1. Here is the end of the log captured during pod termination: ``` [INFO] SIGTERM: Shutting down servers...
@prameshj unfortunately, I don't have any metrics captured when the failure occurred. What would you like us to collect?
Thanks, we'll update our test to collect metrics once we pull in the NodeLocal DNS cache latest version.
We were able to recreate the problem on NodeLocal DNS version 1.21.3. Here are the logs and metrics. ``` [INFO] SIGTERM: Shutting down servers then terminating ``` ``` # HELP...
@prameshj I'll fix our error collection to get metrics on port 9353.
Here's recreate data for NodeLocal DNS version 1.21.3 on Kubernetes version 1.22: **Logs:** ``` [INFO] SIGTERM: Shutting down servers then terminating ``` **Metrics:** ``` # HELP coredns_nodecache_setup_errors_total The number of...