keepalived-cloud-provider icon indicating copy to clipboard operation
keepalived-cloud-provider copied to clipboard

Health check failing on default installation

Open IronhandedLayman opened this issue 9 years ago • 3 comments

After following the given documentation I am seeing the following issue:

Name:		keepalived-cloud-provider-1765620686-d9nkt
Namespace:	kube-system
Node:		<elided/>
Start Time:	Tue, 04 Apr 2017 16:25:17 -0400
Labels:		app=keepalived-cloud-provider
		pod-template-hash=1765620686
Annotations:	kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"keepalived-cloud-provider-1765620686","uid":"ef9618e5-1973-1...
		scheduler.alpha.kubernetes.io/critical-pod=
		scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status:		Running
IP:		192.168.241.247
Controllers:	ReplicaSet/keepalived-cloud-provider-1765620686
Containers:
  keepalived-cloud-provider:
    Container ID:	docker://e3240e5d78a382155c902ee6b5cca8294b3be393959c5c3cfa1eba4d303bc66c
    Image:		quay.io/munnerz/keepalived-cloud-provider
    Image ID:		docker-pullable://quay.io/munnerz/keepalived-cloud-provider@sha256:170351533b23126b8f4eeeeb4293ec417607b762b7ae07d5c018a9cb792d1032
    Port:		
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Error
      Exit Code:	2
      Started:		Mon, 01 Jan 0001 00:00:00 +0000
      Finished:		Tue, 04 Apr 2017 16:29:08 -0400
    Ready:		False
    Restart Count:	5
    Requests:
      cpu:	200m
    Liveness:	http-get http://127.0.0.1:10252/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
    Environment:
      KEEPALIVED_NAMESPACE:	kube-system
      KEEPALIVED_CONFIG_MAP:	vip-configmap
      KEEPALIVED_SERVICE_CIDR:	<elided/>
    Mounts:
      /etc/ssl/certs from certs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jl9jx (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  certs:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/ssl/certs
  default-token-jl9jx:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-jl9jx
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	<none>
Tolerations:	node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
		node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath					Type		Reason		Message
  ---------	--------	-----	----			-------------					--------	------		-------
<snipped/>
  41s	41s	1	kubelet, <elided/>	spec.containers{keepalived-cloud-provider}	Normal	Killing		Killing container with id docker://e3240e5d78a382155c902ee6b5cca8294b3be393959c5c3cfa1eba4d303bc66c:pod "keepalived-cloud-provider-1765620686-d9nkt_kube-system(d16932c2-1974-11e7-a2c7-0025905ca872)" container "keepalived-cloud-provider" is unhealthy, it will be killed and re-created.
  1m	1s	9	kubelet, <elided/>		spec.containers{keepalived-cloud-provider}	Warning	BackOff		Back-off restarting failed container
  41s	1s	5	kubelet, <elided/>							Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "keepalived-cloud-provider" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=keepalived-cloud-provider pod=keepalived-cloud-provider-1765620686-d9nkt_kube-system(d16932c2-1974-11e7-a2c7-0025905ca872)"

it appears to be failing a health check. Removing the health check stops the CrashLoopBackOff but obviously that's not optimal. I can't find the health check on the first pass through the code so I'm not entirely sure what's wrong.

IronhandedLayman avatar Apr 04 '17 20:04 IronhandedLayman

Hi @IronhandedLayman, we got nearly the same issue here, did you manage to find a workaround and/or make it work properly?

antoineserrano avatar Aug 04 '17 07:08 antoineserrano

I'll try and have a look at this at some point today - for now I'd advise leaving it out. It was a very last minute addition to include it at the time!

On Fri, 4 Aug 2017 at 08:27, Antoine Serrano [email protected] wrote:

Hi @IronhandedLayman https://github.com/ironhandedlayman, we got nearly the same issue here, did you manage find a workaround and/or make it work properly?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/munnerz/keepalived-cloud-provider/issues/2#issuecomment-320178152, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMbP2pgQONSGu1d4AlTy4xUTjjlnsCGks5sUsf6gaJpZM4MzZa7 .

munnerz avatar Aug 04 '17 07:08 munnerz

Same Problem for me. I started a shell inside the container and one thing I noticed is that netstat listens on port 10253

/ # netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 :::10253                :::*                    LISTEN      1/keepalived-cloud-

However, even after changing the livenessProbe's port to 10253, it's still failing. When testing inside the container:

/ # wget -q http://127.0.0.1:10253/healthz
/ # cat healthz 
ok/ # 

So the livenessProbe should be working...(but somehow doesn't) Events from kubectl describe pod:

Events:
  Type     Reason                 Age                From                  Message
  ----     ------                 ----               ----                  -------
  Normal   Scheduled              3m                 default-scheduler     Successfully assigned keepalived-cloud-provider-664464fc97-bvpbj to k8s-fed-wk1
  Normal   SuccessfulMountVolume  3m                 kubelet, k8s-fed-wk1  MountVolume.SetUp succeeded for volume "certs"
  Normal   SuccessfulMountVolume  3m                 kubelet, k8s-fed-wk1  MountVolume.SetUp succeeded for volume "keepalived-token-dl7qw"
  Normal   Pulled                 2m (x2 over 3m)    kubelet, k8s-fed-wk1  Container image "quay.io/munnerz/keepalived-cloud-provider:0.0.1" already present on machine
  Normal   Created                2m (x2 over 3m)    kubelet, k8s-fed-wk1  Created container
  Normal   Started                2m (x2 over 3m)    kubelet, k8s-fed-wk1  Started container
  Warning  Unhealthy              53s (x15 over 3m)  kubelet, k8s-fed-wk1  Liveness probe failed: Get http://127.0.0.1:10253/healthz: dial tcp 127.0.0.1:10253: getsockopt: connection refused
  Normal   Killing                53s (x2 over 2m)   kubelet, k8s-fed-wk1  Killing container with id docker://keepalived-cloud-provider:Container failed liveness probe.. Container will be killed and recreated.

Note: It is the exact same error as I get with port 10252...

tommyknows avatar Dec 21 '17 12:12 tommyknows