Pilosa k8s pod error "bind: cannot assign requested address"
What's going wrong?
Pilosa pod in a k8s cluster enters to a crashloopBackOff state.
What was expected?
Pilosa pod should remain in running state without any issues.
Steps to reproduce the behavior
Create a k8s deployment and service with the following yaml files. ( These are the needed files from my helm chart with rendered values) :
# Source: pilosa/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pilosa
labels:
helm.sh/chart: pilosa-0.1.0
app.kubernetes.io/name: pilosa
app.kubernetes.io/instance: RELEASE-NAME
app.kubernetes.io/version: "1.16.0"
app.kubernetes.io/managed-by: Helm
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: pilosa
app.kubernetes.io/instance: RELEASE-NAME
template:
metadata:
labels:
app.kubernetes.io/name: pilosa
app.kubernetes.io/instance: RELEASE-NAME
spec:
serviceAccountName: default
securityContext:
{}
initContainers:
- command:
- /bin/sh
- -c
- |
sysctl -w net.ipv4.tcp_keepalive_time=600
sysctl -w net.ipv4.tcp_keepalive_intvl=60
sysctl -w net.ipv4.tcp_keepalive_probes=3
image: busybox
name: init-sysctl
securityContext:
privileged: true
containers:
- name: pilosa
securityContext:
{}
image: "pilosa/pilosa:v1.4.0"
imagePullPolicy: IfNotPresent
args:
- server
- --data-dir
- /data
- --max-writes-per-request
- "20000"
- --bind
- http://pilosa:10101
- --cluster.coordinator=true
- --gossip.seeds=pilosa:14000
- --handler.allowed-origins="*"
ports:
- name: http
containerPort: 10101
protocol: TCP
livenessProbe:
tcpSocket:
port: http
readinessProbe:
tcpSocket:
port: http
volumeMounts:
- name: "pilosa-pv-storage"
mountPath: /data
resources:
{}
volumes:
- name: pilosa-pv-storage
persistentVolumeClaim:
claimName: pilosa-pv-claim
Service yaml:
# Source: pilosa/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: pilosa
labels:
helm.sh/chart: pilosa-0.1.0
app.kubernetes.io/name: pilosa
app.kubernetes.io/instance: RELEASE-NAME
app.kubernetes.io/version: "1.16.0"
app.kubernetes.io/managed-by: Helm
spec:
type: ClusterIP
ports:
- port: 10101
targetPort: 10101
protocol: TCP
name: http
selector:
app.kubernetes.io/name: pilosa
app.kubernetes.io/instance: RELEASE-NAME
Check the pod status:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pilosa-69574564bc-5f25l 0/1 CrashLoopBackOff 2 71s
Check service:
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
pilosa ClusterIP 10.96.141.79 <none> 10101/TCP 9m14s
Check the pilosa logs:
$ kubectl logs pilosa-69574564bc-5f25l
2021/02/24 14:19:45 Pilosa v1.4.0, build time 2019-09-17T23:29:35+0000
Error: running server: setting up server: getting listener: net.Listen: listen tcp 10.96.141.79:10101: bind: cannot assign requested address
Usage:
pilosa server [flags]
Flags:
--advertise string Address to advertise externally.
--anti-entropy.interval duration Interval at which to run anti-entropy routine. (default 10m0s)
-b, --bind string Default URI on which pilosa should listen. (default ":10101")
--cluster.coordinator Host that will act as cluster coordinator during startup and resizing.
--cluster.disabled Disabled multi-node cluster communication (used for testing)
--cluster.hosts strings Comma separated list of hosts in cluster. Only used for testing.
--cluster.long-query-time duration Duration that will trigger log and stat messages for slow queries. (default 1m0s)
--cluster.replicas int Number of hosts each piece of data should be stored on. (default 1)
-d, --data-dir string Directory to store pilosa data files. (default "~/.pilosa")
--gossip.advertise-host string Host on which memberlist should advertise.
--gossip.advertise-port string Port on which memberlist should advertise.
--gossip.interval duration Interval between sending messages that need to be gossiped that haven't piggybacked on probing messages. (default 200ms)
running server: setting up server: getting listener: net.Listen: listen tcp 10.96.141.79:10101: bind: cannot assign requested address
--gossip.key string The path to file of the encryption key for gossip. The contents of the file should be either 16, 24, or 32 bytes to select AES-128, AES-192, or AES-256.
--gossip.nodes int Number of random nodes to send gossip messages to per GossipInterval. (default 3)
--gossip.port string Port to which pilosa should bind for internal state sharing. (default "14000")
--gossip.probe-interval duration Interval between random node probes. (default 1s)
--gossip.probe-timeout duration Timeout to wait for an ack from a probed node before assuming it is unhealthy. (default 500ms)
--gossip.push-pull-interval duration Interval between complete state syncs. (default 30s)
--gossip.seeds strings Host with which to seed the gossip membership.
--gossip.stream-timeout duration Timeout for establishing a stream connection with a remote node for a full state sync. (default 10s)
--gossip.suspicion-mult int Multiplier for determining the time an inaccessible node is considered suspect before declaring it dead. (default 4)
--gossip.to-the-dead-time duration Interval after which a node has died that we will still try to gossip to it. (default 30s)
--handler.allowed-origins strings Comma separated list of allowed origin URIs (for CORS/WebUI).
-h, --help help for server
--log-path string Log path
--max-file-count uint Soft limit on the maximum number of fragment files Pilosa keeps open simultaneously. (default 1000000)
--max-map-count uint Limits the maximum number of active mmaps. Pilosa will fall back to reading files once this is exhausted. Set below your system's vm.max_map_count. (default 1000000)
--max-writes-per-request int Number of write commands per request. (default 5000)
--metric.diagnostics Enabled diagnostics reporting. (default true)
--metric.host string URI to send metrics when metric.service is statsd.
--metric.poll-interval duration Polling interval metrics.
--metric.service string Where to send stats: can be expvar (in-memory served at /debug/vars), statsd or none. (default "none")
--profile.block-rate int Sampling rate for goroutine blocking profiler. One sample per <rate> ns. (default 10000000)
--profile.mutex-fraction int Sampling fraction for mutex contention profiling. Sample 1/<rate> of events. (default 100)
--tls.certificate string TLS certificate path (usually has the .crt or .pem extension
--tls.key string TLS certificate key path (usually has the .key extension
--tls.skip-verify Skip TLS certificate verification (not secure)
--tracing.agent-host-port string Jaeger agent host:port.
--tracing.sampler-param float Jaeger sampler parameter. (default 0.001)
--tracing.sampler-type string Jaeger sampler type or 'off' to disable tracing completely. (default "remote")
--translation.map-size int Size in bytes of mmap to allocate for key translation.
--translation.primary-url string DEPRECATED: URL for primary translation node for replication.
--verbose Enable verbose logging
Global Flags:
-c, --config string Configuration file to read from.
In the logs, you can see the error:
Error: running server: setting up server: getting listener: net.Listen: listen tcp 10.96.141.79:10101: bind: cannot assign requested address
Information about your environment (OS/architecture, CPU, RAM, cluster/solo, configuration, etc.)
It is a k8s cluster in OCI created with 'quick create' option.
Kubernetes Version : v1.18.10
Shape : VM.Standard1.4
Image Name : Oracle-Linux-7.9-2020.11.10-1
Total Worker Nodes : 3
Extra Info
If i change the bind argument to --bind http://0.0.0.0:10101 , then pod becomes running. But still there is an error log:
$ kubectl logs pilosa-555589659f-tms9c
2021/02/24 14:27:51 Pilosa v1.4.0, build time 2019-09-17T23:29:35+0000
2021/02/24 14:27:51 load NodeID: /data/.id
2021/02/24 14:28:03 retrying after error: 1 error occurred:
* Failed to join 10.96.141.79: dial tcp 10.96.141.79:14000: i/o timeout
2021/02/24 14:28:15 retrying after error: 1 error occurred:
* Failed to join 10.96.141.79: dial tcp 10.96.141.79:14000: i/o timeout
Then if i change gossip.seeds argument also to --gossip.seeds=0.0.0.0:14000, then this error also disappears.
Now the log is:
$ kubectl logs pilosa-75b95f695-q6sh6
2021/02/24 14:43:07 Pilosa v1.4.0, build time 2019-09-17T23:29:35+0000
2021/02/24 14:43:07 load NodeID: /data/.id
2021/02/24 14:43:07 open server
2021/02/24 14:43:07 open holder path: /data
2021/02/24 14:43:07 opening index: lost+found
2021/02/24 14:43:07 ERROR opening index: lost+found, err=validating name: 'lost+found': invalid index or field name, must match [a-z][a-z0-9_-]* and contain at most 64 characters
2021/02/24 14:43:07 open holder: complete
2021/02/24 14:43:07 received state READY (4a53d79d-cd0d-45cf-8bc8-94a4d7e4aca8)
2021/02/24 14:43:07 change cluster state from STARTING to NORMAL on 4a53d79d-cd0d-45cf-8bc8-94a4d7e4aca8
2021/02/24 14:43:07 listening as http://0.0.0.0:10101
2021/02/24 14:43:07 diagnostics disabled
Why binding to matching service's IP is not working? Is there any problem in a k8s cluster if i specify 0.0.0.0 that says pilosa to connect to all available interfaces?
I can also see that --bind http://localhost:10101 and --gossip.seeds=localhost:14000 (which is the default value) works fine.'
But then the liveness and readyness probes configured fails with the following error, and pod restarts repeatedly.
Normal Created 57s (x2 over 89s) kubelet Created container pilosa
Normal Started 57s (x2 over 88s) kubelet Started container pilosa
Warning Unhealthy 29s (x6 over 79s) kubelet Liveness probe failed: dial tcp 10.1.0.32:10101: connect: connection refused
Normal Killing 29s (x2 over 59s) kubelet Container pilosa failed liveness probe, will be restarted
Normal Pulled 28s (x3 over 90s) kubelet Container image "pilosa/pilosa:v1.4.0" already present on machine
Warning Unhealthy 28s (x6 over 78s) kubelet Readiness probe failed: dial tcp 10.1.0.32:10101: connect: connection refused
This probe connection problem doesn't appear if bind and gossip.seeds are configured to 0.0.0.0.