microshift icon indicating copy to clipboard operation
microshift copied to clipboard

USHIFT-266: Don't send second SIGTERM on context.Canceled.

Open benluddy opened this issue 3 years ago • 5 comments

If the context is canceled, the first context.Canceled error returned from a component results in a second SIGTERM that ends the process without waiting for other components to stop.

benluddy avatar Aug 10 '22 13:08 benluddy

We need to check the order of shutdown:

ago 10 11:05:59 microshiftdevel.local microshift[285432]: I0810 11:05:59.109267  285432 scope.go:110] "RemoveContainer" containerID="171fa9921fdae89c0f5b6e3a34859438804183c8cca083957de80$
5a3d26d5a5"
ago 10 11:05:59 microshiftdevel.local microshift[285432]: I0810 11:05:59.109324  285432 scope.go:110] "RemoveContainer" containerID="8c682595d8a0568ad0eae83000afd4c1f33d797c267604d81683e$
b955c13fe2"
ago 10 11:05:59 microshiftdevel.local microshift[285432]: I0810 11:05:59.109337  285432 scope.go:110] "RemoveContainer" containerID="d8d67eeaacb86da1dec8d3a289cf5e07a8c12a00edaa32a5c6898$
0eb2ab6ae0"
ago 10 11:05:59 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:59.111-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying
of unary invoker failed","target":"etcd-endpoints://0xc003daf6c0/127.0.0.1:2379","attempt":92,"error":"rpc error: code = Unavailable desc = etcdserver: server stopped"}
ago 10 11:05:59 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:59.128-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying
of unary invoker failed","target":"etcd-endpoints://0xc00ebe6c40/127.0.0.1:2379","attempt":43,"error":"rpc error: code = Unavailable desc = etcdserver: server stopped"}
ago 10 11:05:59 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:59.130-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying
of unary invoker failed","target":"etcd-endpoints://0xc005936e00/127.0.0.1:2379","attempt":50,"error":"rpc error: code = Unavailable desc = etcdserver: server stopped"}
ago 10 11:05:59 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:59.141-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying
of unary invoker failed","target":"etcd-endpoints://0xc003daf6c0/127.0.0.1:2379","attempt":93,"error":"rpc error: code = Unavailable desc = etcdserver: server stopped"}

mangelajo avatar Aug 10 '22 15:08 mangelajo

It ends with

ago 10 11:06:28 microshiftdevel.local microshift[285432]: I0810 11:06:28.312290  285432 run.go:189] Timed out waiting for services to stop
ago 10 11:06:28 microshiftdevel.local microshift[285432]: I0810 11:06:28.312336  285432 run.go:191] MicroShift stopped
ago 10 11:06:28 microshiftdevel.local systemd[1]: microshift.service: Succeeded.

/hold

mangelajo avatar Aug 10 '22 15:08 mangelajo

With some more detail:

ago 10 11:05:28 microshiftdevel.local systemd[1]: Stopping MicroShift...
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.310918  285432 run.go:181] Interrupt received. Stopping services
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312216  285432 manager.go:118] sysconfwatch-controller completed
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312663  285432 configmap_cafile_content.go:222] "Shutting down controller" name="client-ca::kube-system::extensio>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312699  285432 requestheader_controller.go:183] Shutting down RequestHeaderAuthRequestController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312723  285432 configmap_cafile_content.go:222] "Shutting down controller" name="client-ca::kube-system::extensio>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312898  285432 secure_serving.go:311] Stopped listening on 127.0.0.1:10257
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.312917  285432 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.313434  285432 secure_serving.go:311] Stopped listening on [::]:10259
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.313458  285432 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"info","ts":"2022-08-10T11:05:28.313-0400","caller":"etcdserver/server.go:1462","msg":"skipped leadership transfer for >
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.313640  285432 controller.go:186] Shutting down kubernetes service endpoint reconciler
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.313854  285432 object_count_tracker.go:84] "StorageObjectCountTracker pruner is exiting"
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.314629  285432 manager.go:118] kubelet completed
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.314694  285432 dynamic_cafile_content.go:170] "Shutting down controller" name="client-ca-bundle::/var/lib/microsh>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.314730  285432 manager.go:118] microshift-mdns-controller completed
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.314792  285432 manager.go:118] openshift-controller-manager completed
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.316-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: W0810 11:05:28.317022  285432 genericapiserver.go:815] failed to create event default/openshift-kube-apiserver.170a041b6308b2a4:>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.317-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: W0810 11:05:28.317953  285432 genericapiserver.go:815] failed to create event default/openshift-kube-apiserver.170a041b635bc97e:>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319232  285432 controller.go:122] Shutting down OpenAPI controller
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319276  285432 establishing_controller.go:87] Shutting down EstablishingController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319295  285432 naming_controller.go:302] Shutting down NamingConditionController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319314  285432 apiapproval_controller.go:198] Shutting down KubernetesAPIApprovalPolicyConformantConditionControl>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319332  285432 apiservice_controller.go:131] Shutting down APIServiceRegistrationController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319352  285432 nonstructuralschema_controller.go:204] Shutting down NonStructuralSchemaConditionController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319371  285432 crdregistration_controller.go:143] Shutting down crd-autoregister controller
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319419  285432 crd_finalizer.go:278] Shutting down CRDFinalizer
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319449  285432 apf_controller.go:326] Shutting down API Priority and Fairness config worker
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319469  285432 autoregister_controller.go:165] Shutting down autoregister controller
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319490  285432 customresource_discovery_controller.go:245] Shutting down DiscoveryController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319520  285432 cluster_authentication_trust_controller.go:463] Shutting down cluster_authentication_trust_control>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319539  285432 available_controller.go:525] Shutting down AvailableConditionController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.319567  285432 storage_flowcontrol.go:150] APF bootstrap ensurer is exiting
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.320-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: W0810 11:05:28.320167  285432 genericapiserver.go:815] failed to create event default/openshift-kube-apiserver.170a041b636a606f:>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320269  285432 dynamic_serving_content.go:145] "Shutting down controller" name="aggregator-proxy-cert::/var/lib/m>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320296  285432 controller.go:89] Shutting down OpenAPI AggregationController
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320323  285432 dynamic_cafile_content.go:170] "Shutting down controller" name="request-header::/var/lib/microshif>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320511  285432 dynamic_cafile_content.go:170] "Shutting down controller" name="request-header::/var/lib/microshif>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320555  285432 dynamic_cafile_content.go:170] "Shutting down controller" name="client-ca-bundle::/var/lib/microsh>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.320777  285432 secure_serving.go:311] Stopped listening on [::]:6443
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.321029  285432 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.321075  285432 dynamic_serving_content.go:145] "Shutting down controller" name="serving-cert::/var/lib/microshift>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.321122  285432 dynamic_cafile_content.go:170] "Shutting down controller" name="client-ca-bundle::/var/lib/microsh>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.321-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: W0810 11:05:28.321792  285432 genericapiserver.go:815] failed to create event default/openshift-kube-apiserver.170a041b63950c3f:>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: I0810 11:05:28.322497  285432 manager.go:118] etcd completed
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.323-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>
ago 10 11:05:28 microshiftdevel.local microshift[285432]: {"level":"warn","ts":"2022-08-10T11:05:28.351-0400","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying>

mangelajo avatar Aug 10 '22 15:08 mangelajo

That looks like one or more independent issues to me. This bug would have masked any problems with graceful termination.

benluddy avatar Aug 10 '22 19:08 benluddy

That looks like one or more independent issues to me. This bug would have masked any problems with graceful termination.

I agree, we need to merge this one, but I propose we have an order fix in place so we don't introduce much pain on other development processes.

Anyway, I will not be around for the next 15+ days, so I think it's more fair to let you guys decide what's best.

@oglok @copejon @ggiguash @benluddy @dhellmann ^

Feel free to remove the hold and proceed.

mangelajo avatar Aug 11 '22 09:08 mangelajo

I think it's safe to go ahead with this and address the shutdown ordering separately.

What do you think, @oglok @copejon?

/approve

dhellmann avatar Aug 16 '22 18:08 dhellmann

/kind bug

dhellmann avatar Aug 21 '22 17:08 dhellmann

/hold cancel

oglok avatar Aug 23 '22 10:08 oglok

/lgtm

This PR reveals some underlying shutdown issues in MicroShift. I've created a new JIRA ticket for it (https://issues.redhat.com/browse/USHIFT-316)

oglok avatar Aug 23 '22 10:08 oglok

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benluddy, dhellmann, oglok

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [benluddy,dhellmann,oglok]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Aug 23 '22 10:08 openshift-ci[bot]

/retest-required

Remaining retests: 2 against base HEAD eec8d9fb76647c451498517e44d22e5c607f26da and 8 for PR HEAD 88f75c573111bb9a345078ea813e55e8d4cee898 in total

openshift-ci-robot avatar Aug 23 '22 12:08 openshift-ci-robot

/retest

oglok avatar Aug 23 '22 12:08 oglok

@benluddy: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Aug 23 '22 13:08 openshift-ci[bot]