Can't connect peer to bootstrap as in tests
Hello, I've created ipfs-cluster 4 ipfs-cluster nodes in kubernetes using examples from here.
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: ipfs-cluster-bootstrapper
labels:
name: ipfs-cluster
app: ipfs
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "5001"
prometheus.io/path: "debug/metrics/prometheus"
spec:
replicas: 1
serviceName: ipfs-cluster-svc
template:
metadata:
labels:
name: ipfs-cluster
role: bootstrapper
app: ipfs
spec:
containers:
- name: ipfs-cluster-bootstrapper
image: "ipfs/ipfs-cluster:latest"
command: ["/usr/local/bin/start-daemons.sh"]
args:
- --loglevel
- debug
- --debug
ports:
- containerPort: 4001
name: "swarm"
protocol: "TCP"
- containerPort: 5001
name: "api"
protocol: "TCP"
- containerPort: 9094
name: "clusterapi"
protocol: "TCP"
- containerPort: 9095
name: "clusterproxy"
protocol: "TCP"
- containerPort: 9096
name: "cluster"
protocol: "TCP"
volumeMounts:
- mountPath: /data
name: data
volumeClaimTemplates:
- metadata:
annotations:
volume.alpha.kubernetes.io/storage-class: default
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: ipfs-cluster-peers
labels:
name: ipfs-cluster
# app: ipfs
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "5001"
prometheus.io/path: "debug/metrics/prometheus"
spec:
replicas: 3
serviceName: ipfs-cluster-svc
template:
metadata:
labels:
name: ipfs-cluster
# app: ipfs
role: peer
spec:
containers:
- name: ipfs-cluster
image: "ipfs/ipfs-cluster:latest"
imagePullPolicy: IfNotPresent
command: ["/usr/local/bin/start-daemons.sh"]
args:
- --loglevel
- debug
ports:
- containerPort: 4001
name: "swarm"
protocol: "TCP"
- containerPort: 5001
name: "api"
protocol: "TCP"
- containerPort: 9094
name: "clusterapi"
protocol: "TCP"
- containerPort: 9095
name: "clusterproxy"
protocol: "TCP"
- containerPort: 9096
name: "cluster"
protocol: "TCP"
volumeMounts:
- mountPath: /data
name: data
volumeClaimTemplates:
- metadata:
annotations:
volume.alpha.kubernetes.io/storage-class: default
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Also I created service to interconnect nodes:
apiVersion: v1
kind: Service
metadata:
name: ipfs-cluster-svc
namespace: default
spec:
clusterIP: None
ports:
- name: cluster
targetPort: 9096
port: 9096
protocol: TCP
- name: clusterapi
targetPort: 9094
port: 9094
protocol: TCP
- targetPort: 9095
port: 9095
name: clusterproxy
protocol: TCP
selector:
name: ipfs-cluster
Then I run script to add peers to bbotstrap peer (as in init.sh)
kubectl get pods -l name=ipfs-cluster,role=peer -o jsonpath={.items[*].metadata.name}
+ xargs -n1
+ pods=ipfs-cluster-peers-0
ipfs-cluster-peers-1
ipfs-cluster-peers-2
+ kubectl get pods -l name=ipfs-cluster,role=bootstrapper -o jsonpath={.items[*].metadata.name}
+ bootstrapper=ipfs-cluster-bootstrapper-0
+ kubectl get pods ipfs-cluster-peers-0 -o jsonpath={.status.podIP}
+ + jq -r .id
kubectl exec ipfs-cluster-peers-0 -- ipfs-cluster-ctl --enc json id
+ echo /ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww
+ addr=/ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww
+ kubectl exec ipfs-cluster-bootstrapper-0 -- ipfs-cluster-ctl peers add /ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww
An error ocurred:
**Code: 500
Message: dial attempt failed: <peer.ID dYrcex> --> <peer.ID ZwQwbA> dial attempt failed: incoming message was too large**
This is log of bootstrap pod:
4:30:56.453 DEBUG p2p-gorpc: makeCall: Cluster.PeerAdd client.go:106
14:30:56.453 DEBUG p2p-gorpc: local call: Cluster.PeerAdd client.go:112
14:30:56.453 DEBUG cluster: peerAdd called with /ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww cluster.go:512
14:30:56.453 DEBUG cluster: adding peer /ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww peer_manager.go:33
14:30:56.453 INFO cluster: new Cluster peer /ip4/10.244.0.141/tcp/9096/ipfs/QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww peer_manager.go:41
14:30:56.453 DEBUG p2p-gorpc: makeCall: Cluster.RemoteMultiaddrForPeer client.go:106
14:30:56.453 DEBUG p2p-gorpc: sending remote call client.go:144
14:30:58.287 DEBUG monitor: monitoring tick peer_monitor.go:264
14:30:58.287 DEBUG p2p-gorpc: makeCall: Cluster.PeerManagerPeers client.go:106
14:30:58.287 DEBUG p2p-gorpc: local call: Cluster.PeerManagerPeers client.go:112
14:30:58.287 DEBUG monitor: check metrics ping peer_monitor.go:278
14:30:58.287 DEBUG monitor: check metrics disk-freespace peer_monitor.go:278
14:30:59.892 DEBUG cluster: Leader <peer.ID dYrcex> about to broadcast metric ping to [<peer.ID ZwQwbA> <peer.ID dYrcex>]. Expires: 2017-10-12T14:31:29.892463518Z cluster.go:229
14:30:59.892 DEBUG p2p-gorpc: makeCall: Cluster.PeerMonitorLogMetric client.go:106
14:30:59.892 DEBUG p2p-gorpc: local call: Cluster.PeerMonitorLogMetric client.go:112
14:30:59.892 DEBUG monitor: logged 'ping' metric from '<peer.ID dYrcex>'. Expires on 2017-10-12T14:31:29.892463518Z peer_monitor.go:181
14:30:59.892 DEBUG p2p-gorpc: makeCall: Cluster.PeerMonitorLogMetric client.go:106
14:30:59.892 DEBUG p2p-gorpc: sending remote call client.go:144
14:31:00.870 DEBUG p2p-gorpc: makeCall: Cluster.IPFSFreeSpace client.go:106
14:31:00.870 DEBUG p2p-gorpc: local call: Cluster.IPFSFreeSpace client.go:112
14:31:00.870 DEBUG ipfshttp: getting repo/stat ipfshttp.go:697
14:31:00.872 DEBUG cluster: Leader <peer.ID dYrcex> about to broadcast metric disk-freespace to [<peer.ID dYrcex> <peer.ID ZwQwbA>]. Expires: 2017-10-12T14:31:30.872385036Z cluster.go:229
14:31:00.872 DEBUG p2p-gorpc: makeCall: Cluster.PeerMonitorLogMetric client.go:106
14:31:00.872 DEBUG p2p-gorpc: sending remote call client.go:144
14:31:00.872 DEBUG p2p-gorpc: makeCall: Cluster.PeerMonitorLogMetric client.go:106
14:31:00.872 DEBUG p2p-gorpc: local call: Cluster.PeerMonitorLogMetric client.go:112
14:31:00.872 DEBUG monitor: logged 'disk-freespace' metric from '<peer.ID dYrcex>'. Expires on 2017-10-12T14:31:30.872385036Z peer_monitor.go:181
14:31:06.454 ERROR cluster: dial attempt failed: context deadline exceeded cluster.go:537
14:31:06.454 DEBUG cluster: removing peer QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww peer_manager.go:52
14:31:06.454 ERROR cluster: error pushing metric to QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww: dial attempt failed: context deadline exceeded cluster.go:238
14:31:06.454 ERROR cluster: error pushing metric to QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww: dial attempt failed: context deadline exceeded cluster.go:238
14:31:06.454 INFO cluster: removing Cluster peer QmZwQwbAiKA6zNgmN6sJyua3gfqbxCRESKR3qCSzTLn1ww peer_manager.go:55
14:31:06.454 DEBUG cluster: Leader <peer.ID dYrcex> broadcasted metric disk-freespace to [<peer.ID dYrcex> <peer.ID ZwQwbA>]. Expires: 2017-10-12T14:31:30.872385036Z cluster.go:241
14:31:06.454 ERROR p2p-gorpc: dial attempt failed: context deadline exceeded client.go:125
14:31:06.454 DEBUG cluster: Leader <peer.ID dYrcex> broadcasted metric ping to [<peer.ID ZwQwbA> <peer.ID dYrcex>]. Expires: 2017-10-12T14:31:29.892463518Z cluster.go:241
14:31:06.454 ERROR restapi: sending error response: 500: dial attempt failed: context deadline exceeded restapi.go:519
Tcpdump shows normal TCP\IP flow:
# tcpdump host 10.244.4.120 and port not 4001
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:24:51.683723 IP ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [S], seq 3040182720, win 29200, options [mss 1418,sackOK,TS val 947970455 ecr 0,nop,wscale 7], length 0
13:24:51.683767 IP ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [S.], seq 1428266664, ack 3040182721, win 28960, options [mss 1460,sackOK,TS val 1040225904 ecr 947970455,nop,wscale 7], length 0
13:24:51.684452 IP ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [.], ack 1, win 229, options [nop,nop,TS val 947970455 ecr 1040225904], length 0
13:24:51.684734 IP ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [P.], seq 1:25, ack 1, win 227, options [nop,nop,TS val 1040225904 ecr 947970455], length 24
13:24:51.684775 IP ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [P.], seq 25:45, ack 1, win 227, options [nop,nop,TS val 1040225904 ecr 947970455], length 20
13:24:51.685393 IP ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [.], ack 25, win 229, options [nop,nop,TS val 947970455 ecr 1040225904], length 0
13:24:51.685410 IP ipfs-cluster-bootstrapper-0.ipfs-cluster-svc.default.svc.cluster.local.9096 > ipfs-cluster-peers-0.ipfs-cluster-svc.default.svc.cluster.local.9096: Flags [.], ack 45, win 229, options [nop,nop,TS val 947970455 ecr 1040225904], length 0
Probably a problem with the cluster secret used by each peer.
Note that this project is solely for running a number of automated tests on ipfs/ipfs-cluster and not for deploying any of them for real-world-use within kubernetes.
I understand kubernetes here is purely for testing purposes, but just want to clarify some stuff if it's possible.
Why do tests do not require the same secrets across all peers?
I thought about same secret, but got some strange issue that folders /data/ipfs and /data/ipfs-cluster are updated each time pod dies (and starts again). So I can't change secret in service.json and restart, I looked into /usr/local/bin/start-daemons.sh maybe it's purely Azure problem, but files other than these two directories are not changed.
I don't have such issue in docker and two local volumes for each daemon.
Why do tests do not require the same secrets across all peers?
They do, afaik they just run a custom container which ensures that.
Other than that, I am not sure why your /data folders are not persistent.
Seems like I know the root of my problem: VOLUME directives in ipfs/go-ipfs and ipfs/ipfs-cluster Dockerfile's. Seems like I need multiple volumes to run pods, not expected behaviour at all.
I don't fully understand how VOLUME directives affect kubernetes, but maybe want to open an issue and explain? We can fix the dockerfiles if there's a way to improve them...
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: ipfs-cluster-bootstrapper
labels:
name: ipfs-cluster
app: ipfs
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "5001"
prometheus.io/path: "debug/metrics/prometheus"
spec:
replicas: 1
serviceName: ipfs-cluster-svc
template:
metadata:
labels:
name: ipfs-cluster
role: bootstrapper
app: ipfs
spec:
containers:
- name: ipfs-cluster-bootstrapper
image: "ipfs/ipfs-cluster:latest"
command: ["/usr/local/bin/start-daemons.sh"]
args:
- --loglevel
- debug
- --debug
ports:
- containerPort: 4001
name: "swarm"
protocol: "TCP"
- containerPort: 5001
name: "api"
protocol: "TCP"
- containerPort: 9094
name: "clusterapi"
protocol: "TCP"
- containerPort: 9095
name: "clusterproxy"
protocol: "TCP"
- containerPort: 9096
name: "cluster"
protocol: "TCP"
volumeMounts:
- mountPath: /data/ipfs
name: data-ipfs
- mountPath: /data/ipfs-cluster
name: data-ipfs-cluster
volumeClaimTemplates:
- metadata:
annotations:
volume.alpha.kubernetes.io/storage-class: default
name: data-ipfs
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
- metadata:
annotations:
volume.alpha.kubernetes.io/storage-class: default
name: data-ipfs-cluster
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
This is how to fix this behavior. NEEDS two volumes.
@mikhail-manuilov, would you want to send in a pull request with the changes you're proposing? I'll be happy to look it over and approve it once I confirm it meets our requirements.
Since there's kubernetes definition files are for testing purposes only, and posted above tested only in Azure cloud. Also I suppose having two volumes for one container is no-good, maybe Dockerfile should be changed for ipfs/go-ipfs and ipfs/ipfs-cluster. Since ipfs/ipfs-cluster uses FROM ipfs/go-ipfs, I suppose creating one VOLUME for /data in ipfs/go-ipfs and deleting VOLUME $IPFS_CLUSTER_PATH from ipfs/ipfs-cluster Dockerfile will do the job