kubernetes network debugging guide
- [x] toolbox and attaching to shell-less containers ( #803, #810)
- [x] k8s-dns troubles (#810)
- [ ] iptables troubleshooting
- [ ] NAT troubleshooting
A common issue when getting started with Kubernetes is debugging networking. We need to provide a debugging guide for Kubernetes networking that covers the following topics:
DNS Debugging
Untested:
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o template --template="{{range.items}}{{.metadata.name}}{{end}}" | xargs -I{} kubectl port-forward --namespace=kube-system {} 5300:53
dig something something
Bridge Debugging
ssh into a host and use toolbox + tcpdump to dump the flannel0/cbr0 bridge
Pod Debugging
Bash in the container
cd k8s.io/kubernetes/examples/guestbook
for i in *.yaml; do kubectl create -f ${i}; done
Find a pod you want to debug
kubectl get pods
$ kubectl exec -ti frontend-r1lq4 /bin/bash
root@frontend-r1lq4:/var/www/html# ping yahoo.com
PING yahoo.com (98.138.253.109): 56 data bytes
64 bytes from 98.138.253.109: icmp_seq=0 ttl=127 time=50.670 ms
64 bytes from 98.138.253.109: icmp_seq=1 ttl=127 time=43.191 ms
^C--- yahoo.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 43.191/46.931/50.670/3.740 ms
No bash
You will need to tutorial a busybox sleep container added to the pod. This is not exactly that but you get the idea:
$ kubectl run --image busybox tester -- /bin/sleep 5000
Error from server: deployments.extensions "tester" already exists
$ kubectl delete deployment tester
deployment "tester" deleted
$ kubectl run --image busybox tester -- /bin/sleep 5000
deployment "tester" created
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
bash-3819658126-4r2r3 0/1 CrashLoopBackOff 5 4m
frontend-du0jv 1/1 Running 0 5m
frontend-k0ykp 1/1 Running 0 5m
frontend-r1lq4 1/1 Running 0 5m
redis-master-3djov 1/1 Running 0 5m
redis-slave-aran5 0/1 Pending 0 5m
redis-slave-u2hdj 1/1 Running 0 5m
tester-3286786242-7xidb 1/1 Running 0 4s
$ kubectl exec -ti tester-3286786242-7xidb /bin/bash
exec: "/bin/bash": stat /bin/bash: no such file or directory
error: error executing remote command: Error executing command in container: Error executing in Docker Container: -1
$ kubectl exec -ti tester-3286786242-7xidb /bin/sh
/ # ping yahoo.com
PING yahoo.com (206.190.36.45): 56 data bytes
64 bytes from 206.190.36.45: seq=0 ttl=127 time=35.360 ms
^C
--- yahoo.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 35.360/35.360/35.360 ms
@omkensey could you take a look at writing this doc?
DNS debugging: does the stock way of verifying working DNS in Kubernetes docs suffice? (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns, subtopic "How do I test if it is working?") See also note below about your "no bash in a pod" case.
Bridge debugging: something like this?
CoreOS stable (899.15.0)
core@test-inst ~ $ toolbox
latest: Pulling from library/fedora
6888fc827a3f: Pull complete
9bdb5101e5fc: Pull complete
Digest: sha256:1fa98be10c550ffabde65246ed2df16be28dc896d6e370dab56b98460bd27823
Status: Downloaded newer image for fedora:latest
core-fedora-latest
Spawning container core-fedora-latest on /var/lib/toolbox/core-fedora-latest.
Press ^] three times within 1s to kill container.
[root@test-inst ~]# dnf install iproute tcpdump
Fedora 23 - x86_64 15 MB/s | 43 MB 00:02
Fedora 23 - x86_64 - Updates 20 MB/s | 22 MB 00:01
Last metadata expiration check performed 0:00:08 ago on Wed Apr 27 18:52:42 2016.
Dependencies resolved.
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
iproute x86_64 4.1.1-3.fc23 updates 598 k
libpcap x86_64 14:1.7.4-1.fc23 fedora 146 k
linux-atm-libs x86_64 2.5.1-13.fc23 fedora 40 k
tcpdump x86_64 14:4.7.4-3.fc23 fedora 437 k
Transaction Summary
================================================================================
Install 4 Packages
Total download size: 1.2 M
Installed size: 3.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/4): iproute-4.1.1-3.fc23.x86_64.rpm 1.6 MB/s | 598 kB 00:00
(2/4): libpcap-1.7.4-1.fc23.x86_64.rpm 130 kB/s | 146 kB 00:01
(3/4): tcpdump-4.7.4-3.fc23.x86_64.rpm 298 kB/s | 437 kB 00:01
(4/4): linux-atm-libs-2.5.1-13.fc23.x86_64.rpm 35 kB/s | 40 kB 00:01
--------------------------------------------------------------------------------
Total 442 kB/s | 1.2 MB 00:02
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Installing : linux-atm-libs-2.5.1-13.fc23.x86_64 1/4
Installing : libpcap-14:1.7.4-1.fc23.x86_64 2/4
Installing : tcpdump-14:4.7.4-3.fc23.x86_64 3/4
Installing : iproute-4.1.1-3.fc23.x86_64 4/4
Verifying : tcpdump-14:4.7.4-3.fc23.x86_64 1/4
Verifying : libpcap-14:1.7.4-1.fc23.x86_64 2/4
Verifying : iproute-4.1.1-3.fc23.x86_64 3/4
Verifying : linux-atm-libs-2.5.1-13.fc23.x86_64 4/4
Installed:
iproute.x86_64 4.1.1-3.fc23 libpcap.x86_64 14:1.7.4-1.fc23
linux-atm-libs.x86_64 2.5.1-13.fc23 tcpdump.x86_64 14:4.7.4-3.fc23
Complete!
[root@test-inst ~]# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens4v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 42:01:0a:f0:00:02 brd ff:ff:ff:ff:ff:ff
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:97:5b:f0:47 brd ff:ff:ff:ff:ff:ff
[root@test-inst ~]# tcpdump -i docker0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes
[...]
Pod debugging: No bash Still working on attaching a busybox pod to an existing pod for debugging. In fact we might want to just use that for any pod-based debugging and the DNS debugging case as well; it will make it simpler to convey the decision process (on the host, use toolbox for network debugging; in a pod, attach busybox (or fedora/debian?) for everything).
I'll also write a bit about using nsenter on the host to get inside an existing container's network namespace.
For debugging shell-less pods using busybox containers, this seems to be a valid flow:
$ kubectl describe pod [pod name]
- note container ID
On host, using docker to link to to docker container:
$ docker run --link [container ID] -it busybox /bin/sh
On host, using rkt to link to rkt container:
$ rkt status [container ID]
- note PID of stage1
$ nsenter -n -t /proc/[pid]/ns/net rkt run --net=host [busybox container]
On host, using rkt to link to docker container, only the method of getting the PID to attach to is different:
$ docker inspect -f '{{.State.Pid}}' [container ID]
@omkensey @joshix Can we convert the doc to something to be merged here now? https://docs.google.com/document/d/16k7ISf9CKlYHqXwLGzPhBqwL-JU8B2cnpPTFqkS339M/edit#heading=h.pzbewqsubu0m
@philips Yes
@joshix What is the state of this document? #810 claims we have a doc, but comments on that claim it isn't complete.
@philips I would like to contribute on this
add iptables 💯