[BUG] the server doesn't have a resource type "route"
What happened:
I encountered a reproducible issue. Microshift AIO in container (podman) works fine , however whenever i reboot the host machine, then start microshift container podman start microshift and do oc get routes after a while , i get an error error: the server doesn't have a resource type "route" , and i get healthcheck failure errors in router-default container logs
What you expected to happen:
oc get routes should not throw error about resource type does not exists
How to reproduce it (as minimally and precisely as possible):
- Launch MicroShift AOI in container (podman)
-
oc get routeto verify its working - Reboot the host machine which is having microshift container
- Wait for host and microshift to be up
-
oc get route<--- This throws error
Anything else we need to know?:
Here are the logs of router-default pod
[root@cd3611b48b4e /]# oc get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-flannel-ds-jbsd2 1/1 Running 0 57s
kube-system openshift-console-deployment-7c8785cc5c-sf6x8 1/1 Running 0 37s
kubevirt-hostpath-provisioner kubevirt-hostpath-provisioner-884lz 1/1 Running 0 75s
openshift-dns dns-default-mxllr 2/2 Running 0 102s
openshift-dns node-resolver-7lbt6 1/1 Running 0 92s
openshift-ingress router-default-584549f645-xx6zp 0/1 Running 1 17s
openshift-service-ca service-ca-7bffb6f6bf-dwsxw 1/1 Running 0 2m5s
[root@cd3611b48b4e /]#
[root@cd3611b48b4e /]# oc describe po router-default-584549f645-xx6zp -n openshift-ingress
Name: router-default-584549f645-xx6zp
Namespace: openshift-ingress
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: cd3611b48b4e/10.88.0.2
Start Time: Tue, 12 Apr 2022 09:26:44 +0000
Labels: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
pod-template-hash=584549f645
Annotations: target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
unsupported.do-not-use.openshift.io/override-liveness-grace-period-seconds: 10
Status: Running
IP: 10.88.0.2
IPs:
IP: 10.88.0.2
Controlled By: ReplicaSet/router-default-584549f645
Containers:
router:
Container ID: cri-o://5dbb8fe62029a505625b27edb07fbe27de4d114e94fbcda96c67bd34bbf20d63
Image: quay.io/openshift/okd-content@sha256:01cfbbfdc11e2cbb8856f31a65c83acc7cfbd1986c1309f58c255840efcc0b64
Image ID: quay.io/openshift/okd-content@sha256:01cfbbfdc11e2cbb8856f31a65c83acc7cfbd1986c1309f58c255840efcc0b64
Ports: 80/TCP, 443/TCP, 1936/TCP
Host Ports: 80/TCP, 443/TCP, 1936/TCP
State: Running
Started: Tue, 12 Apr 2022 09:26:46 +0000
Last State: Terminated
Reason: Error
Message: metrics "msg"="listening on the metrics port failed" "error"="listen tcp 0.0.0.0:1936: bind: address already in use"
I0412 09:26:44.915365 1 metrics.go:155] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1904632]
goroutine 112 [running]:
github.com/cockroachdb/cmux.(*muxListener).Close(0xc00000f0f8, 0x4723a7, 0x413496)
<autogenerated>:1 +0x32
net/http.(*onceCloseListener).close(...)
/usr/lib/golang/src/net/http/server.go:3395
sync.(*Once).doSlow(0xc00017b450, 0xc0007dbe18)
/usr/lib/golang/src/sync/once.go:68 +0xec
sync.(*Once).Do(...)
/usr/lib/golang/src/sync/once.go:59
net/http.(*onceCloseListener).Close(0xc00017b440, 0xc000336420, 0xc000646000)
/usr/lib/golang/src/net/http/server.go:3391 +0x78
net/http.(*Server).Serve(0xc0002a2380, 0x207d310, 0xc00000f0f8, 0x2042540, 0xc000646000)
/usr/lib/golang/src/net/http/server.go:2981 +0x5f6
github.com/openshift/router/pkg/router/metrics.Listener.Listen.func1(0x2041b60, 0xc0000cdcc0, 0x207d310, 0xc00000f0f8)
/go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:147 +0x72
created by github.com/openshift/router/pkg/router/metrics.Listener.Listen
/go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:143 +0x1b9
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1902fec]
goroutine 114 [running]:
github.com/cockroachdb/cmux.(*cMux).Serve(0xc0000cdd80, 0x0, 0x0)
/go/src/github.com/openshift/router/vendor/github.com/cockroachdb/cmux/cmux.go:124 +0x8c
github.com/openshift/router/pkg/router/metrics.Listener.Listen.func3(0x206ee50, 0xc0000cdd80)
/go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:172 +0x35
created by github.com/openshift/router/pkg/router/metrics.Listener.Listen
/go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:171 +0x417
Exit Code: 2
Started: Tue, 12 Apr 2022 09:26:44 +0000
Finished: Tue, 12 Apr 2022 09:26:44 +0000
Ready: False
Restart Count: 1
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://localhost:1936/healthz/ready delay=10s timeout=1s period=10s #success=1 #failure=3
Startup: http-get http://:1936/healthz/ready delay=0s timeout=1s period=1s #success=1 #failure=120
Environment:
STATS_PORT: 1936
ROUTER_SERVICE_NAMESPACE: openshift-ingress
DEFAULT_CERTIFICATE_DIR: /etc/pki/tls/private
DEFAULT_DESTINATION_CA_PATH: /var/run/configmaps/service-ca/service-ca.crt
ROUTER_CIPHERS: TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ROUTER_DISABLE_HTTP2: true
ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK: false
ROUTER_METRICS_TLS_CERT_FILE: /etc/pki/tls/private/tls.crt
ROUTER_METRICS_TLS_KEY_FILE: /etc/pki/tls/private/tls.key
ROUTER_METRICS_TYPE: haproxy
ROUTER_SERVICE_NAME: default
ROUTER_SET_FORWARDED_HEADERS: append
ROUTER_THREADS: 4
SSL_MIN_VERSION: TLSv1.2
ROUTER_SUBDOMAIN: ${name}-${namespace}.apps.127.0.0.1.nip.io
ROUTER_ALLOW_WILDCARD_ROUTES: true
ROUTER_OVERRIDE_HOSTNAME: true
Mounts:
/etc/pki/tls/private from default-certificate (ro)
/var/run/configmaps/service-ca from service-ca-bundle (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7jxjf (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-certificate:
Type: Secret (a volume populated by a Secret)
SecretName: router-certs-default
Optional: false
service-ca-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: service-ca-bundle
Optional: false
kube-api-access-7jxjf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 29s default-scheduler Successfully assigned openshift-ingress/router-default-584549f645-xx6zp to cd3611b48b4e
Normal Pulled 28s (x2 over 29s) kubelet Container image "quay.io/openshift/okd-content@sha256:01cfbbfdc11e2cbb8856f31a65c83acc7cfbd1986c1309f58c255840efcc0b64" already present on machine
Normal Created 27s (x2 over 29s) kubelet Created container router
Normal Started 27s (x2 over 29s) kubelet Started container router
Warning Unhealthy 18s (x9 over 26s) kubelet Startup probe failed: HTTP probe failed with statuscode: 500
Warning ProbeError 17s (x10 over 26s) kubelet Startup probe error: HTTP probe failed with statuscode: 500
body: [-]backend-http failed: reason withheld
[-]has-synced failed: reason withheld
[+]process-running ok
healthz check failed
[root@cd3611b48b4e /]#
Environment:
- Microshift version (use
microshift version):Microshift-AIO:latest - Hardware configuration:
- OS (e.g:
cat /etc/os-release): Fedora - Kernel (e.g.
uname -a):Linux fedora 5.14.10-300.fc35.x86_64 #1 SMP Thu Oct 7 20:48:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux - Others:
Relevant Logs
I also tried to delete all microshift pods but that did not worked as expected. Temporary workaround is to destroy microshift-aio and re-create
More logs
$ podman stop microshift
microshift
$podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
$ podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
275cb547d39f quay.io/microshift/microshift-aio:latest /sbin/init 47 minutes ago Exited (137) 10 seconds ago 0.0.0.0:80->80/tcp, 0.0.0.0:6443->6443/tcp, 0.0.0.0:8080->8080/tcp microshift
$
$ podman start microshift
microshift
$ podman exec microshift oc get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-flannel-ds-gkkhk 1/1 Running 0 46m
kubevirt-hostpath-provisioner kubevirt-hostpath-provisioner-vq6lx 1/1 Running 0 46m
openshift-dns dns-default-v4wg2 2/2 Running 0 46m
openshift-dns node-resolver-chpjm 1/1 Running 0 46m
openshift-ingress router-default-6c96f6bc66-knmp5 1/1 Running 0 46m
openshift-service-ca service-ca-7bffb6f6bf-gx7qr 1/1 Running 0 46m
$ podman exec microshift oc get route -A
error: the server doesn't have a resource type "route"
$
@oglok per our email conversation. This seem to be a big blocker.
As a microshift user ,i would first stup the cluster , install apps i need. Later if we want to stop the cluster, i would expect when we bring it back (start) it should have all my apps running and i can access them as i was doing it prior reboot the microshift instance.
Any help on this would be appretiated.
Hi @ksingh7
This issue is caused by the same reason as #556 . Since you are using the AIO MicroShift image, whenever the container is restarted, it will acquire a new IP, so endpoints must be updated.
The following PR #650 should this too.
Let us build a new AIO image, so you can test it.
@ksingh7 new image has been built. Could you please test it? Thanks!
https://quay.io/repository/microshift/microshift-aio?tab=tags
@ksingh7 have you tried with podman pause/unpause ? Is there a reason not to pause the container?
@oglok thanks a lot for new image, which i have not tested yet, i will test it soon and share feedback.
Also i have not tried podman pause/unpause i will test that as well. Will keep you posted
I'm having a similar issue and podman pause/unpuase does not seem to resolve it.
$ sudo podman exec -ti microshift oc get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-ingress pod/router-default-6c96f6bc66-gm2k7 0/1 Pending 0 65m
openshift-service-ca pod/service-ca-7bffb6f6bf-x2r87 0/1 Pending 0 65m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 66m
default service/openshift-apiserver ClusterIP None <none> 443/TCP 65m
default service/openshift-oauth-apiserver ClusterIP None <none> 443/TCP 65m
openshift-dns service/dns-default ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9154/TCP 65m
openshift-ingress service/router-external-default NodePort 10.43.209.199 <none> 80:30001/TCP,443:30002/TCP 65m
openshift-ingress service/router-internal-default ClusterIP 10.43.82.233 <none> 80/TCP,443/TCP,1936/TCP 65m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-flannel-ds 0 0 0 0 0 <none> 65m
kubevirt-hostpath-provisioner daemonset.apps/kubevirt-hostpath-provisioner 0 0 0 0 0 <none> 65m
openshift-dns daemonset.apps/dns-default 0 0 0 0 0 kubernetes.io/os=linux 65m
openshift-dns daemonset.apps/node-resolver 0 0 0 0 0 kubernetes.io/os=linux 65m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
openshift-ingress deployment.apps/router-default 0/1 1 0 65m
openshift-service-ca deployment.apps/service-ca 0/1 1 0 65m
NAMESPACE NAME DESIRED CURRENT READY AGE
openshift-ingress replicaset.apps/router-default-6c96f6bc66 1 1 0 65m
openshift-service-ca replicaset.apps/service-ca-7bffb6f6bf 1 1 0 65m
Notably, the ingress and service-ca are both stuck at PENDING:
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-ingress pod/router-default-6c96f6bc66-gm2k7 0/1 Pending 0 65m
openshift-service-ca pod/service-ca-7bffb6f6bf-x2r87 0/1 Pending 0 65m
And, as mentioned, the route resource is not available:
$ sudo podman exec -ti microshift oc get routes -A
error: the server doesn't have a resource type "routes"
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
The team has decided to drop support for running MicroShift in a container and focus on running via systemd. If you're still having trouble with this issue using that configuration, please create a new issue with the details of that configuration.