Health check failing on default installation
After following the given documentation I am seeing the following issue:
Name: keepalived-cloud-provider-1765620686-d9nkt
Namespace: kube-system
Node: <elided/>
Start Time: Tue, 04 Apr 2017 16:25:17 -0400
Labels: app=keepalived-cloud-provider
pod-template-hash=1765620686
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"keepalived-cloud-provider-1765620686","uid":"ef9618e5-1973-1...
scheduler.alpha.kubernetes.io/critical-pod=
scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status: Running
IP: 192.168.241.247
Controllers: ReplicaSet/keepalived-cloud-provider-1765620686
Containers:
keepalived-cloud-provider:
Container ID: docker://e3240e5d78a382155c902ee6b5cca8294b3be393959c5c3cfa1eba4d303bc66c
Image: quay.io/munnerz/keepalived-cloud-provider
Image ID: docker-pullable://quay.io/munnerz/keepalived-cloud-provider@sha256:170351533b23126b8f4eeeeb4293ec417607b762b7ae07d5c018a9cb792d1032
Port:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Tue, 04 Apr 2017 16:29:08 -0400
Ready: False
Restart Count: 5
Requests:
cpu: 200m
Liveness: http-get http://127.0.0.1:10252/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment:
KEEPALIVED_NAMESPACE: kube-system
KEEPALIVED_CONFIG_MAP: vip-configmap
KEEPALIVED_SERVICE_CIDR: <elided/>
Mounts:
/etc/ssl/certs from certs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-jl9jx (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
default-token-jl9jx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-jl9jx
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
<snipped/>
41s 41s 1 kubelet, <elided/> spec.containers{keepalived-cloud-provider} Normal Killing Killing container with id docker://e3240e5d78a382155c902ee6b5cca8294b3be393959c5c3cfa1eba4d303bc66c:pod "keepalived-cloud-provider-1765620686-d9nkt_kube-system(d16932c2-1974-11e7-a2c7-0025905ca872)" container "keepalived-cloud-provider" is unhealthy, it will be killed and re-created.
1m 1s 9 kubelet, <elided/> spec.containers{keepalived-cloud-provider} Warning BackOff Back-off restarting failed container
41s 1s 5 kubelet, <elided/> Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "keepalived-cloud-provider" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=keepalived-cloud-provider pod=keepalived-cloud-provider-1765620686-d9nkt_kube-system(d16932c2-1974-11e7-a2c7-0025905ca872)"
it appears to be failing a health check. Removing the health check stops the CrashLoopBackOff but obviously that's not optimal. I can't find the health check on the first pass through the code so I'm not entirely sure what's wrong.
Hi @IronhandedLayman, we got nearly the same issue here, did you manage to find a workaround and/or make it work properly?
I'll try and have a look at this at some point today - for now I'd advise leaving it out. It was a very last minute addition to include it at the time!
On Fri, 4 Aug 2017 at 08:27, Antoine Serrano [email protected] wrote:
Hi @IronhandedLayman https://github.com/ironhandedlayman, we got nearly the same issue here, did you manage find a workaround and/or make it work properly?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/munnerz/keepalived-cloud-provider/issues/2#issuecomment-320178152, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMbP2pgQONSGu1d4AlTy4xUTjjlnsCGks5sUsf6gaJpZM4MzZa7 .
Same Problem for me. I started a shell inside the container and one thing I noticed is that netstat listens on port 10253
/ # netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 :::10253 :::* LISTEN 1/keepalived-cloud-
However, even after changing the livenessProbe's port to 10253, it's still failing. When testing inside the container:
/ # wget -q http://127.0.0.1:10253/healthz
/ # cat healthz
ok/ #
So the livenessProbe should be working...(but somehow doesn't)
Events from kubectl describe pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m default-scheduler Successfully assigned keepalived-cloud-provider-664464fc97-bvpbj to k8s-fed-wk1
Normal SuccessfulMountVolume 3m kubelet, k8s-fed-wk1 MountVolume.SetUp succeeded for volume "certs"
Normal SuccessfulMountVolume 3m kubelet, k8s-fed-wk1 MountVolume.SetUp succeeded for volume "keepalived-token-dl7qw"
Normal Pulled 2m (x2 over 3m) kubelet, k8s-fed-wk1 Container image "quay.io/munnerz/keepalived-cloud-provider:0.0.1" already present on machine
Normal Created 2m (x2 over 3m) kubelet, k8s-fed-wk1 Created container
Normal Started 2m (x2 over 3m) kubelet, k8s-fed-wk1 Started container
Warning Unhealthy 53s (x15 over 3m) kubelet, k8s-fed-wk1 Liveness probe failed: Get http://127.0.0.1:10253/healthz: dial tcp 127.0.0.1:10253: getsockopt: connection refused
Normal Killing 53s (x2 over 2m) kubelet, k8s-fed-wk1 Killing container with id docker://keepalived-cloud-provider:Container failed liveness probe.. Container will be killed and recreated.
Note: It is the exact same error as I get with port 10252...