Richard Theis comments

Results 64 comments of


                                            Richard Theis

NodeLocal DNS container hung on SIGTERM

@prameshj I was unable to collect any valuable logs at the time of the latest failure. I assume there is some type of lock contention that causes the pod to...

NodeLocal DNS container hung on SIGTERM

@prameshj That is correct. We've updated the termination grace period to 900 seconds and still see this problem. Although, the failure rate has been much lower than it has been...

NodeLocal DNS container hung on SIGTERM

We hit the problem again on Kubernetes version 1.20 with NodeLocal DNS version 1.17.3. Unfortunately, I don't have any additional debug data to provide.

NodeLocal DNS container hung on SIGTERM

We hit this problem again on Kubernetes version 1.22 with NodeLocal DNS version 1.21.1. We are collecting debug data now to determine if we can find the root cause.

NodeLocal DNS container hung on SIGTERM

We hit the problem again Kubernetes version 1.22 with NodeLocal DNS version 1.21.1. Here is the end of the log captured during pod termination: ``` [INFO] SIGTERM: Shutting down servers...

NodeLocal DNS container hung on SIGTERM

@prameshj unfortunately, I don't have any metrics captured when the failure occurred. What would you like us to collect?

NodeLocal DNS container hung on SIGTERM

Thanks, we'll update our test to collect metrics once we pull in the NodeLocal DNS cache latest version.

NodeLocal DNS container hung on SIGTERM

We were able to recreate the problem on NodeLocal DNS version 1.21.3. Here are the logs and metrics. ``` [INFO] SIGTERM: Shutting down servers then terminating ``` ``` # HELP...

NodeLocal DNS container hung on SIGTERM

@prameshj I'll fix our error collection to get metrics on port 9353.

NodeLocal DNS container hung on SIGTERM

Here's recreate data for NodeLocal DNS version 1.21.3 on Kubernetes version 1.22: **Logs:** ``` [INFO] SIGTERM: Shutting down servers then terminating ``` **Metrics:** ``` # HELP coredns_nodecache_setup_errors_total The number of...