docs icon indicating copy to clipboard operation
docs copied to clipboard

kubernetes network debugging guide

Open philips opened this issue 9 years ago • 8 comments

  • [x] toolbox and attaching to shell-less containers ( #803, #810)
  • [x] k8s-dns troubles (#810)
  • [ ] iptables troubleshooting
  • [ ] NAT troubleshooting

A common issue when getting started with Kubernetes is debugging networking. We need to provide a debugging guide for Kubernetes networking that covers the following topics:

DNS Debugging

Untested:

kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o template --template="{{range.items}}{{.metadata.name}}{{end}}" | xargs -I{} kubectl port-forward --namespace=kube-system {} 5300:53

dig something something

Bridge Debugging

ssh into a host and use toolbox + tcpdump to dump the flannel0/cbr0 bridge

Pod Debugging

Bash in the container

cd k8s.io/kubernetes/examples/guestbook
for i in *.yaml; do kubectl create  -f ${i}; done

Find a pod you want to debug

kubectl get pods 
$ kubectl exec -ti frontend-r1lq4 /bin/bash
root@frontend-r1lq4:/var/www/html# ping yahoo.com
PING yahoo.com (98.138.253.109): 56 data bytes
64 bytes from 98.138.253.109: icmp_seq=0 ttl=127 time=50.670 ms
64 bytes from 98.138.253.109: icmp_seq=1 ttl=127 time=43.191 ms
^C--- yahoo.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 43.191/46.931/50.670/3.740 ms

No bash

You will need to tutorial a busybox sleep container added to the pod. This is not exactly that but you get the idea:

$ kubectl run --image busybox tester -- /bin/sleep 5000
Error from server: deployments.extensions "tester" already exists
$ kubectl delete deployment tester
deployment "tester" deleted
$ kubectl run --image busybox tester -- /bin/sleep 5000
deployment "tester" created
$ kubectl get pod
NAME                      READY     STATUS             RESTARTS   AGE
bash-3819658126-4r2r3     0/1       CrashLoopBackOff   5          4m
frontend-du0jv            1/1       Running            0          5m
frontend-k0ykp            1/1       Running            0          5m
frontend-r1lq4            1/1       Running            0          5m
redis-master-3djov        1/1       Running            0          5m
redis-slave-aran5         0/1       Pending            0          5m
redis-slave-u2hdj         1/1       Running            0          5m
tester-3286786242-7xidb   1/1       Running            0          4s
$ kubectl exec -ti tester-3286786242-7xidb /bin/bash
exec: "/bin/bash": stat /bin/bash: no such file or directory
error: error executing remote command: Error executing command in container: Error executing in Docker Container: -1
$ kubectl exec -ti tester-3286786242-7xidb /bin/sh
/ # ping yahoo.com
PING yahoo.com (206.190.36.45): 56 data bytes
64 bytes from 206.190.36.45: seq=0 ttl=127 time=35.360 ms
^C
--- yahoo.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 35.360/35.360/35.360 ms

philips avatar Apr 26 '16 18:04 philips

@omkensey could you take a look at writing this doc?

philips avatar Apr 26 '16 18:04 philips

DNS debugging: does the stock way of verifying working DNS in Kubernetes docs suffice? (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns, subtopic "How do I test if it is working?") See also note below about your "no bash in a pod" case.

Bridge debugging: something like this?

CoreOS stable (899.15.0)
core@test-inst ~ $ toolbox
latest: Pulling from library/fedora

6888fc827a3f: Pull complete 
9bdb5101e5fc: Pull complete 
Digest: sha256:1fa98be10c550ffabde65246ed2df16be28dc896d6e370dab56b98460bd27823
Status: Downloaded newer image for fedora:latest
core-fedora-latest
Spawning container core-fedora-latest on /var/lib/toolbox/core-fedora-latest.
Press ^] three times within 1s to kill container.
[root@test-inst ~]# dnf install iproute tcpdump
Fedora 23 - x86_64                               15 MB/s |  43 MB     00:02    
Fedora 23 - x86_64 - Updates                     20 MB/s |  22 MB     00:01    
Last metadata expiration check performed 0:00:08 ago on Wed Apr 27 18:52:42 2016.
Dependencies resolved.
================================================================================
 Package              Arch         Version                  Repository     Size
================================================================================
Installing:
 iproute              x86_64       4.1.1-3.fc23             updates       598 k
 libpcap              x86_64       14:1.7.4-1.fc23          fedora        146 k
 linux-atm-libs       x86_64       2.5.1-13.fc23            fedora         40 k
 tcpdump              x86_64       14:4.7.4-3.fc23          fedora        437 k

Transaction Summary
================================================================================
Install  4 Packages

Total download size: 1.2 M
Installed size: 3.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/4): iproute-4.1.1-3.fc23.x86_64.rpm          1.6 MB/s | 598 kB     00:00    
(2/4): libpcap-1.7.4-1.fc23.x86_64.rpm          130 kB/s | 146 kB     00:01    
(3/4): tcpdump-4.7.4-3.fc23.x86_64.rpm          298 kB/s | 437 kB     00:01    
(4/4): linux-atm-libs-2.5.1-13.fc23.x86_64.rpm   35 kB/s |  40 kB     00:01    
--------------------------------------------------------------------------------
Total                                           442 kB/s | 1.2 MB     00:02     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Installing  : linux-atm-libs-2.5.1-13.fc23.x86_64                         1/4 
  Installing  : libpcap-14:1.7.4-1.fc23.x86_64                              2/4 
  Installing  : tcpdump-14:4.7.4-3.fc23.x86_64                              3/4 
  Installing  : iproute-4.1.1-3.fc23.x86_64                                 4/4 
  Verifying   : tcpdump-14:4.7.4-3.fc23.x86_64                              1/4 
  Verifying   : libpcap-14:1.7.4-1.fc23.x86_64                              2/4 
  Verifying   : iproute-4.1.1-3.fc23.x86_64                                 3/4 
  Verifying   : linux-atm-libs-2.5.1-13.fc23.x86_64                         4/4 

Installed:
  iproute.x86_64 4.1.1-3.fc23               libpcap.x86_64 14:1.7.4-1.fc23      
  linux-atm-libs.x86_64 2.5.1-13.fc23       tcpdump.x86_64 14:4.7.4-3.fc23      

Complete!
[root@test-inst ~]# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens4v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 42:01:0a:f0:00:02 brd ff:ff:ff:ff:ff:ff
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:97:5b:f0:47 brd ff:ff:ff:ff:ff:ff
[root@test-inst ~]# tcpdump -i docker0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes
[...]

Pod debugging: No bash Still working on attaching a busybox pod to an existing pod for debugging. In fact we might want to just use that for any pod-based debugging and the DNS debugging case as well; it will make it simpler to convey the decision process (on the host, use toolbox for network debugging; in a pod, attach busybox (or fedora/debian?) for everything).

I'll also write a bit about using nsenter on the host to get inside an existing container's network namespace.

omkensey avatar Apr 27 '16 19:04 omkensey

For debugging shell-less pods using busybox containers, this seems to be a valid flow:

$ kubectl describe pod [pod name]

  • note container ID

On host, using docker to link to to docker container: $ docker run --link [container ID] -it busybox /bin/sh

On host, using rkt to link to rkt container: $ rkt status [container ID]

  • note PID of stage1

$ nsenter -n -t /proc/[pid]/ns/net rkt run --net=host [busybox container]

On host, using rkt to link to docker container, only the method of getting the PID to attach to is different: $ docker inspect -f '{{.State.Pid}}' [container ID]

omkensey avatar Apr 27 '16 20:04 omkensey

@omkensey @joshix Can we convert the doc to something to be merged here now? https://docs.google.com/document/d/16k7ISf9CKlYHqXwLGzPhBqwL-JU8B2cnpPTFqkS339M/edit#heading=h.pzbewqsubu0m

philips avatar May 03 '16 23:05 philips

@philips Yes

joshix avatar May 04 '16 17:05 joshix

@joshix What is the state of this document? #810 claims we have a doc, but comments on that claim it isn't complete.

pop avatar Jan 13 '17 23:01 pop

@philips I would like to contribute on this

rahulkrishnanfs avatar Oct 14 '17 04:10 rahulkrishnanfs

add iptables 💯

dogopupper avatar Feb 15 '18 16:02 dogopupper