USHIFT-266: Don't send second SIGTERM on context.Canceled.
If the context is canceled, the first context.Canceled error returned from a component results in a second SIGTERM that ends the process without waiting for other components to stop.
We need to check the order of shutdown:
ago 10 11:05:59 microshiftdevel.local microshift[285432]: I0810 11:05:59.109267 285432 scope.go:110] "RemoveContainer" containerID="171fa9921fdae89c0f5b6e3a34859438804183c8cca083957de80$
5a3d26d5a5"
ago 10 11:05:59 microshiftdevel.local microshift[285432]: I0810 11:05:59.109324 285432 scope.go:110] "RemoveContainer" containerID="8c682595d8a0568ad0eae83000afd4c1f33d797c267604d81683e$
b955c13fe2"
ago 10 11:05:59 microshiftdevel.local microshift[285432]: I0810 11:05:59.109337 285432 scope.go:110] "RemoveContainer" containerID="d8d67eeaacb86da1dec8d3a289cf5e07a8c12a00edaa32a5c6898$
0eb2ab6ae0"
ago 10 11:05:59 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:59.111-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying
of unary invoker failed","target":"etcd-endpoints://0xc003daf6c0/127.0.0.1:2379","attempt":92,"error":"rpc error: code = Unavailable desc = etcdserver: server stopped"}
ago 10 11:05:59 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:59.128-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying
of unary invoker failed","target":"etcd-endpoints://0xc00ebe6c40/127.0.0.1:2379","attempt":43,"error":"rpc error: code = Unavailable desc = etcdserver: server stopped"}
ago 10 11:05:59 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:59.130-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying
of unary invoker failed","target":"etcd-endpoints://0xc005936e00/127.0.0.1:2379","attempt":50,"error":"rpc error: code = Unavailable desc = etcdserver: server stopped"}
ago 10 11:05:59 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:59.141-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying
of unary invoker failed","target":"etcd-endpoints://0xc003daf6c0/127.0.0.1:2379","attempt":93,"error":"rpc error: code = Unavailable desc = etcdserver: server stopped"}
It ends with
ago 10 11:06:28 microshiftdevel.local microshift[285432]: I0810 11:06:28.312290 285432 run.go:189] Timed out waiting for services to stop
ago 10 11:06:28 microshiftdevel.local microshift[285432]: I0810 11:06:28.312336 285432 run.go:191] MicroShift stopped
ago 10 11:06:28 microshiftdevel.local systemd[1]: microshift.service: Succeeded.
/hold
With some more detail:
ago 10 11:05:28 microshiftdevel.local systemd[1]: Stopping MicroShift...
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.310918 285432 run.go:181] Interrupt received. Stopping services
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312216 285432 manager.go:118] sysconfwatch-controller completed
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312663 285432 configmap_cafile_content.go:222] "Shutting down controller" name="client-ca::kube-system::extensio>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312699 285432 requestheader_controller.go:183] Shutting down RequestHeaderAuthRequestController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312723 285432 configmap_cafile_content.go:222] "Shutting down controller" name="client-ca::kube-system::extensio>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312898 285432 secure_serving.go:311] Stopped listening on 127.0.0.1:10257
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312917 285432 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.313434 285432 secure_serving.go:311] Stopped listening on [::]:10259
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.313458 285432 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"info","ts":"2022-08-10T11:05:28.313-0400","caller":"etcdserver/server.go:1462","msg":"skipped leadership transfer for >
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.313640 285432 controller.go:186] Shutting down kubernetes service endpoint reconciler
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.313854 285432 object_count_tracker.go:84] "StorageObjectCountTracker pruner is exiting"
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.314629 285432 manager.go:118] kubelet completed
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.314694 285432 dynamic_cafile_content.go:170] "Shutting down controller" name="client-ca-bundle::/var/lib/microsh>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.314730 285432 manager.go:118] microshift-mdns-controller completed
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.314792 285432 manager.go:118] openshift-controller-manager completed
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.316-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: W0810 11:05:28.317022 285432 genericapiserver.go:815] failed to create event default/openshift-kube-apiserver.170a041b6308b2a4:>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.317-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: W0810 11:05:28.317953 285432 genericapiserver.go:815] failed to create event default/openshift-kube-apiserver.170a041b635bc97e:>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319232 285432 controller.go:122] Shutting down OpenAPI controller
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319276 285432 establishing_controller.go:87] Shutting down EstablishingController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319295 285432 naming_controller.go:302] Shutting down NamingConditionController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319314 285432 apiapproval_controller.go:198] Shutting down KubernetesAPIApprovalPolicyConformantConditionControl>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319332 285432 apiservice_controller.go:131] Shutting down APIServiceRegistrationController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319352 285432 nonstructuralschema_controller.go:204] Shutting down NonStructuralSchemaConditionController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319371 285432 crdregistration_controller.go:143] Shutting down crd-autoregister controller
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319419 285432 crd_finalizer.go:278] Shutting down CRDFinalizer
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319449 285432 apf_controller.go:326] Shutting down API Priority and Fairness config worker
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319469 285432 autoregister_controller.go:165] Shutting down autoregister controller
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319490 285432 customresource_discovery_controller.go:245] Shutting down DiscoveryController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319520 285432 cluster_authentication_trust_controller.go:463] Shutting down cluster_authentication_trust_control>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319539 285432 available_controller.go:525] Shutting down AvailableConditionController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319567 285432 storage_flowcontrol.go:150] APF bootstrap ensurer is exiting
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.320-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: W0810 11:05:28.320167 285432 genericapiserver.go:815] failed to create event default/openshift-kube-apiserver.170a041b636a606f:>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320269 285432 dynamic_serving_content.go:145] "Shutting down controller" name="aggregator-proxy-cert::/var/lib/m>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320296 285432 controller.go:89] Shutting down OpenAPI AggregationController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320323 285432 dynamic_cafile_content.go:170] "Shutting down controller" name="request-header::/var/lib/microshif>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320511 285432 dynamic_cafile_content.go:170] "Shutting down controller" name="request-header::/var/lib/microshif>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320555 285432 dynamic_cafile_content.go:170] "Shutting down controller" name="client-ca-bundle::/var/lib/microsh>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320777 285432 secure_serving.go:311] Stopped listening on [::]:6443
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.321029 285432 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.321075 285432 dynamic_serving_content.go:145] "Shutting down controller" name="serving-cert::/var/lib/microshift>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.321122 285432 dynamic_cafile_content.go:170] "Shutting down controller" name="client-ca-bundle::/var/lib/microsh>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.321-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: W0810 11:05:28.321792 285432 genericapiserver.go:815] failed to create event default/openshift-kube-apiserver.170a041b63950c3f:>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.322497 285432 manager.go:118] etcd completed
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.323-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.351-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
That looks like one or more independent issues to me. This bug would have masked any problems with graceful termination.
That looks like one or more independent issues to me. This bug would have masked any problems with graceful termination.
I agree, we need to merge this one, but I propose we have an order fix in place so we don't introduce much pain on other development processes.
Anyway, I will not be around for the next 15+ days, so I think it's more fair to let you guys decide what's best.
@oglok @copejon @ggiguash @benluddy @dhellmann ^
Feel free to remove the hold and proceed.
I think it's safe to go ahead with this and address the shutdown ordering separately.
What do you think, @oglok @copejon?
/approve
/kind bug
/hold cancel
/lgtm
This PR reveals some underlying shutdown issues in MicroShift. I've created a new JIRA ticket for it (https://issues.redhat.com/browse/USHIFT-316)
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: benluddy, dhellmann, oglok
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [benluddy,dhellmann,oglok]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/retest-required
Remaining retests: 2 against base HEAD eec8d9fb76647c451498517e44d22e5c607f26da and 8 for PR HEAD 88f75c573111bb9a345078ea813e55e8d4cee898 in total
/retest
@benluddy: all tests passed!
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.