microshift [BUG] the server doesn't have a resource type "route"

What happened:

I encountered a reproducible issue. Microshift AIO in container (podman) works fine , however whenever i reboot the host machine, then start microshift container podman start microshift and do oc get routes after a while , i get an error error: the server doesn't have a resource type "route" , and i get healthcheck failure errors in router-default container logs

What you expected to happen:

oc get routes should not throw error about resource type does not exists

How to reproduce it (as minimally and precisely as possible):

Launch MicroShift AOI in container (podman)
oc get route to verify its working
Reboot the host machine which is having microshift container
Wait for host and microshift to be up
oc get route <--- This throws error

Anything else we need to know?:

Here are the logs of router-default pod

[root@cd3611b48b4e /]# oc get po -A
NAMESPACE                       NAME                                            READY   STATUS    RESTARTS   AGE
kube-system                     kube-flannel-ds-jbsd2                           1/1     Running   0          57s
kube-system                     openshift-console-deployment-7c8785cc5c-sf6x8   1/1     Running   0          37s
kubevirt-hostpath-provisioner   kubevirt-hostpath-provisioner-884lz             1/1     Running   0          75s
openshift-dns                   dns-default-mxllr                               2/2     Running   0          102s
openshift-dns                   node-resolver-7lbt6                             1/1     Running   0          92s
openshift-ingress               router-default-584549f645-xx6zp                 0/1     Running   1          17s
openshift-service-ca            service-ca-7bffb6f6bf-dwsxw                     1/1     Running   0          2m5s
[root@cd3611b48b4e /]#

[root@cd3611b48b4e /]# oc describe po router-default-584549f645-xx6zp -n openshift-ingress
Name:                 router-default-584549f645-xx6zp
Namespace:            openshift-ingress
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 cd3611b48b4e/10.88.0.2
Start Time:           Tue, 12 Apr 2022 09:26:44 +0000
Labels:               ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
                      pod-template-hash=584549f645
Annotations:          target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
                      unsupported.do-not-use.openshift.io/override-liveness-grace-period-seconds: 10
Status:               Running
IP:                   10.88.0.2
IPs:
  IP:           10.88.0.2
Controlled By:  ReplicaSet/router-default-584549f645
Containers:
  router:
    Container ID:  cri-o://5dbb8fe62029a505625b27edb07fbe27de4d114e94fbcda96c67bd34bbf20d63
    Image:         quay.io/openshift/okd-content@sha256:01cfbbfdc11e2cbb8856f31a65c83acc7cfbd1986c1309f58c255840efcc0b64
    Image ID:      quay.io/openshift/okd-content@sha256:01cfbbfdc11e2cbb8856f31a65c83acc7cfbd1986c1309f58c255840efcc0b64
    Ports:         80/TCP, 443/TCP, 1936/TCP
    Host Ports:    80/TCP, 443/TCP, 1936/TCP
    State:         Running
      Started:     Tue, 12 Apr 2022 09:26:46 +0000
    Last State:    Terminated
      Reason:      Error
      Message:     metrics "msg"="listening on the metrics port failed" "error"="listen tcp 0.0.0.0:1936: bind: address already in use"
I0412 09:26:44.915365       1 metrics.go:155] metrics "msg"="router health and metrics port listening on HTTP and HTTPS"  "address"="0.0.0.0:1936"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1904632]

goroutine 112 [running]:
github.com/cockroachdb/cmux.(*muxListener).Close(0xc00000f0f8, 0x4723a7, 0x413496)
  <autogenerated>:1 +0x32
net/http.(*onceCloseListener).close(...)
  /usr/lib/golang/src/net/http/server.go:3395
sync.(*Once).doSlow(0xc00017b450, 0xc0007dbe18)
  /usr/lib/golang/src/sync/once.go:68 +0xec
sync.(*Once).Do(...)
  /usr/lib/golang/src/sync/once.go:59
net/http.(*onceCloseListener).Close(0xc00017b440, 0xc000336420, 0xc000646000)
  /usr/lib/golang/src/net/http/server.go:3391 +0x78
net/http.(*Server).Serve(0xc0002a2380, 0x207d310, 0xc00000f0f8, 0x2042540, 0xc000646000)
  /usr/lib/golang/src/net/http/server.go:2981 +0x5f6
github.com/openshift/router/pkg/router/metrics.Listener.Listen.func1(0x2041b60, 0xc0000cdcc0, 0x207d310, 0xc00000f0f8)
  /go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:147 +0x72
created by github.com/openshift/router/pkg/router/metrics.Listener.Listen
  /go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:143 +0x1b9
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1902fec]

goroutine 114 [running]:
github.com/cockroachdb/cmux.(*cMux).Serve(0xc0000cdd80, 0x0, 0x0)
  /go/src/github.com/openshift/router/vendor/github.com/cockroachdb/cmux/cmux.go:124 +0x8c
github.com/openshift/router/pkg/router/metrics.Listener.Listen.func3(0x206ee50, 0xc0000cdd80)
  /go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:172 +0x35
created by github.com/openshift/router/pkg/router/metrics.Listener.Listen
  /go/src/github.com/openshift/router/pkg/router/metrics/metrics.go:171 +0x417

      Exit Code:    2
      Started:      Tue, 12 Apr 2022 09:26:44 +0000
      Finished:     Tue, 12 Apr 2022 09:26:44 +0000
    Ready:          False
    Restart Count:  1
    Requests:
      cpu:      100m
      memory:   256Mi
    Liveness:   http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://localhost:1936/healthz/ready delay=10s timeout=1s period=10s #success=1 #failure=3
    Startup:    http-get http://:1936/healthz/ready delay=0s timeout=1s period=1s #success=1 #failure=120
    Environment:
      STATS_PORT:                                1936
      ROUTER_SERVICE_NAMESPACE:                  openshift-ingress
      DEFAULT_CERTIFICATE_DIR:                   /etc/pki/tls/private
      DEFAULT_DESTINATION_CA_PATH:               /var/run/configmaps/service-ca/service-ca.crt
      ROUTER_CIPHERS:                            TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
      ROUTER_DISABLE_HTTP2:                      true
      ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK:  false
      ROUTER_METRICS_TLS_CERT_FILE:              /etc/pki/tls/private/tls.crt
      ROUTER_METRICS_TLS_KEY_FILE:               /etc/pki/tls/private/tls.key
      ROUTER_METRICS_TYPE:                       haproxy
      ROUTER_SERVICE_NAME:                       default
      ROUTER_SET_FORWARDED_HEADERS:              append
      ROUTER_THREADS:                            4
      SSL_MIN_VERSION:                           TLSv1.2
      ROUTER_SUBDOMAIN:                          ${name}-${namespace}.apps.127.0.0.1.nip.io
      ROUTER_ALLOW_WILDCARD_ROUTES:              true
      ROUTER_OVERRIDE_HOSTNAME:                  true
    Mounts:
      /etc/pki/tls/private from default-certificate (ro)
      /var/run/configmaps/service-ca from service-ca-bundle (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7jxjf (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-certificate:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-certs-default
    Optional:    false
  service-ca-bundle:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      service-ca-bundle
    Optional:  false
  kube-api-access-7jxjf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason      Age                 From               Message
  ----     ------      ----                ----               -------
  Normal   Scheduled   29s                 default-scheduler  Successfully assigned openshift-ingress/router-default-584549f645-xx6zp to cd3611b48b4e
  Normal   Pulled      28s (x2 over 29s)   kubelet            Container image "quay.io/openshift/okd-content@sha256:01cfbbfdc11e2cbb8856f31a65c83acc7cfbd1986c1309f58c255840efcc0b64" already present on machine
  Normal   Created     27s (x2 over 29s)   kubelet            Created container router
  Normal   Started     27s (x2 over 29s)   kubelet            Started container router
  Warning  Unhealthy   18s (x9 over 26s)   kubelet            Startup probe failed: HTTP probe failed with statuscode: 500
  Warning  ProbeError  17s (x10 over 26s)  kubelet            Startup probe error: HTTP probe failed with statuscode: 500
body: [-]backend-http failed: reason withheld
[-]has-synced failed: reason withheld
[+]process-running ok
healthz check failed
[root@cd3611b48b4e /]#

Environment:

Microshift version (use microshift version): Microshift-AIO:latest
Hardware configuration:
OS (e.g: cat /etc/os-release): Fedora
Kernel (e.g. uname -a): Linux fedora 5.14.10-300.fc35.x86_64 #1 SMP Thu Oct 7 20:48:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Others:

Relevant Logs

Apr 12 '22 09:04 ksingh7

I also tried to delete all microshift pods but that did not worked as expected. Temporary workaround is to destroy microshift-aio and re-create

Apr 12 '22 09:04 ksingh7

More logs

$ podman stop microshift
microshift
$podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
$ podman ps -a
CONTAINER ID  IMAGE                                     COMMAND     CREATED         STATUS                       PORTS                                                               NAMES
275cb547d39f  quay.io/microshift/microshift-aio:latest  /sbin/init  47 minutes ago  Exited (137) 10 seconds ago  0.0.0.0:80->80/tcp, 0.0.0.0:6443->6443/tcp, 0.0.0.0:8080->8080/tcp  microshift
$
$ podman start microshift
microshift
$ podman exec microshift oc get po -A
NAMESPACE                       NAME                                  READY   STATUS    RESTARTS   AGE
kube-system                     kube-flannel-ds-gkkhk                 1/1     Running   0          46m
kubevirt-hostpath-provisioner   kubevirt-hostpath-provisioner-vq6lx   1/1     Running   0          46m
openshift-dns                   dns-default-v4wg2                     2/2     Running   0          46m
openshift-dns                   node-resolver-chpjm                   1/1     Running   0          46m
openshift-ingress               router-default-6c96f6bc66-knmp5       1/1     Running   0          46m
openshift-service-ca            service-ca-7bffb6f6bf-gx7qr           1/1     Running   0          46m

$ podman exec microshift oc get route -A
error: the server doesn't have a resource type "route"
$

Apr 13 '22 18:04 ksingh7

@oglok per our email conversation. This seem to be a big blocker.

As a microshift user ,i would first stup the cluster , install apps i need. Later if we want to stop the cluster, i would expect when we bring it back (start) it should have all my apps running and i can access them as i was doing it prior reboot the microshift instance.

Any help on this would be appretiated.

Apr 19 '22 12:04 ksingh7

Hi @ksingh7

This issue is caused by the same reason as #556 . Since you are using the AIO MicroShift image, whenever the container is restarted, it will acquire a new IP, so endpoints must be updated.

The following PR #650 should this too.

Let us build a new AIO image, so you can test it.

Apr 20 '22 13:04 oglok

@ksingh7 new image has been built. Could you please test it? Thanks!

https://quay.io/repository/microshift/microshift-aio?tab=tags

Apr 20 '22 15:04 oglok

@ksingh7 have you tried with podman pause/unpause ? Is there a reason not to pause the container?

Apr 21 '22 11:04 oglok

@oglok thanks a lot for new image, which i have not tested yet, i will test it soon and share feedback.

Also i have not tried podman pause/unpause i will test that as well. Will keep you posted

May 02 '22 14:05 ksingh7

I'm having a similar issue and podman pause/unpuase does not seem to resolve it.

$ sudo podman exec -ti microshift oc get all -A
NAMESPACE              NAME                                  READY   STATUS    RESTARTS   AGE
openshift-ingress      pod/router-default-6c96f6bc66-gm2k7   0/1     Pending   0          65m
openshift-service-ca   pod/service-ca-7bffb6f6bf-x2r87       0/1     Pending   0          65m

NAMESPACE           NAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
default             service/kubernetes                  ClusterIP   10.43.0.1       <none>        443/TCP                      66m
default             service/openshift-apiserver         ClusterIP   None            <none>        443/TCP                      65m
default             service/openshift-oauth-apiserver   ClusterIP   None            <none>        443/TCP                      65m
openshift-dns       service/dns-default                 ClusterIP   10.43.0.10      <none>        53/UDP,53/TCP,9154/TCP       65m
openshift-ingress   service/router-external-default     NodePort    10.43.209.199   <none>        80:30001/TCP,443:30002/TCP   65m
openshift-ingress   service/router-internal-default     ClusterIP   10.43.82.233    <none>        80/TCP,443/TCP,1936/TCP      65m

NAMESPACE                       NAME                                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system                     daemonset.apps/kube-flannel-ds                 0         0         0       0            0           <none>                   65m
kubevirt-hostpath-provisioner   daemonset.apps/kubevirt-hostpath-provisioner   0         0         0       0            0           <none>                   65m
openshift-dns                   daemonset.apps/dns-default                     0         0         0       0            0           kubernetes.io/os=linux   65m
openshift-dns                   daemonset.apps/node-resolver                   0         0         0       0            0           kubernetes.io/os=linux   65m

NAMESPACE              NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
openshift-ingress      deployment.apps/router-default   0/1     1            0           65m
openshift-service-ca   deployment.apps/service-ca       0/1     1            0           65m

NAMESPACE              NAME                                        DESIRED   CURRENT   READY   AGE
openshift-ingress      replicaset.apps/router-default-6c96f6bc66   1         1         0       65m
openshift-service-ca   replicaset.apps/service-ca-7bffb6f6bf       1         1         0       65m

Notably, the ingress and service-ca are both stuck at PENDING:

NAMESPACE              NAME                                  READY   STATUS    RESTARTS   AGE
openshift-ingress      pod/router-default-6c96f6bc66-gm2k7   0/1     Pending   0          65m
openshift-service-ca   pod/service-ca-7bffb6f6bf-x2r87       0/1     Pending   0          65m

And, as mentioned, the route resource is not available:

$ sudo podman exec -ti microshift oc get routes -A
error: the server doesn't have a resource type "routes"

May 06 '22 04:05 bobbygryzynger

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Aug 04 '22 19:08 openshift-bot

The team has decided to drop support for running MicroShift in a container and focus on running via systemd. If you're still having trouble with this issue using that configuration, please create a new issue with the details of that configuration.

Aug 21 '22 16:08 dhellmann