Troubleshooting Load Balancers in IBM Cloud Kubernetes Service Using tcpdump

6 min read

This series of blog posts will describe how to use tcpdump to troubleshoot tough load balancer and application network issues.

These posts will focus specifically on the classic Load Balancer v1 that can be created in IBM Cloud Kubernetes Service and Red Hat OpenShift on IBM Cloud clusters. However, much of what is described here can be used in other Kubernetes and OpenShift clusters.

When should you use tcpdump for troubleshooting?

tcpdump is a powerful tool, but it isn't always the best option. In most cases, it generates a LOT of data, and even for experienced developers, it can be hard to filter and interpret the generated data. When doing initial troubleshooting, I recommend using the Kubernetes service troubleshooting guide. Also, make sure to carefully examine the logs of the application to which that load balancer is sending data. In many cases, it is not a problem with the load balancer or cluster networking, but instead a problem with the application itself.

I have found that tcpdump is most useful after exhausting all other options. Specifically, it has helped me identify problems that are:

  • Not easily recreatable (for instance, where requests fail less than 5% of the time)
  • Only happening at heavy load or at seemingly random times
  • Performance related (e.g., at certain times, requests take 10x longer than at other times)

Overview of IBM Cloud Kubernetes Service LoadBalancer v1

Before we use tcpdump, we need to understand the path that packets take as they travel from the client to the Kubernetes application that is using the LoadBalancer v1. Here is a diagram that shows a public classic LoadBalancer v1 with two endpoint application pods:

Here is a diagram that shows a public classic LoadBalancer v1 with two endpoint application pods:

The LoadBalancer v1 implementation is a virtual (floating) IP address (VIP) that is added to one of the worker nodes in the cluster. That worker node becomes the load balancer, handling all the traffic and balancing it to the backend pods. Note that if externalTrafficPolicy: Local is specified for the load balancer, then the traffic is only sent to the pod(s) that are on the same worker node as the VIP. 

This VIP is managed by two host network pods named ibm-cloud-provider-ip-XXX-XXX-XXX_XXX-... that run keepalived to ensure that the VIP is always on exactly one of the worker nodes. If the node the VIP is currently on is deleted, crashes or loses network connectivity, these keepalived pods move the VIP to a healthy node to ensure the Loadbalancer continues to function. 

A few important things to note:

  • The ibm-cloud-provider-ip-XXX-XXX-XXX_XXX-... pods do NOT "handle" any of the packets. They are host network pods that just exist to ensure the VIP is assigned to exactly one worker node at all times.
  • The load balancing is handled by the iptables rules in the nat table that are set by kube-proxy (the same ones that handle clusterIP and NodePort traffic for the service). They do load balancing randomly between the pods/endpoints that implement the service (NOT round-robin or any more sophisticated balancing).
  • LoadBalancer v2 is similar to this, but uses IPVS tunneling to send packets to the endpoint pods and uses Direct Server Return (DSR) to send packets from the endpoint pods directly to the client (bypassing the Loadbalancer node). It also can use several load balancing algorithms.
  • For IBM Cloud Kubernetes Service VPC clusters, a VPC LoadBalancer that is external to the cluster is used, so this blog post does not apply to VPC clusters.

Using tcpdump to capture packet traces

If your classic Loadbalancer v1 service isn't working properly, and you have not been able to determine the problem using the basic service and application troubleshooting, you might want to capture the packets to/from the LoadBalancer.

Access the worker node

The first thing to do is to find a way run the tcpdump command on the worker node itself. Sometimes, it might be useful to run tcpdump inside of the pod that has the problem, but this is often difficult since many pods don't allow you to exec into a shell in the pod, might not have tcpdump or any package manager installed and might not be running with enough authority to run tcpdump.

Here are two methods to access the worker node. Both start a pod with host networking and then exec into it to run tcpdump:

  • OpenShift clusters: This command starts a "debug" pod in the default namespace and then execs you into it: oc debug node/<nodename>
  • Kubernetes clusters: Manually create an Alpine pod with host networking. Run the following command to create the pod on the ${NODE} worker node and install tcpdump:
    kubectl apply -f - << EOF
    apiVersion: v1
    kind: Pod
    metadata:
      name: debug-${NODE}
      namespace: default
    spec:
      containers:
      - args: ["-c", "apk add tcpdump; sleep 1d"]
        command: ["/bin/sh"]
        image: alpine:latest
        imagePullPolicy: IfNotPresent
        name: debug
        resources: {}
        securityContext:
          privileged: true
          runAsUser: 0
      dnsPolicy: ClusterFirst
      hostNetwork: true
      hostPID: true
      nodeSelector:
        kubernetes.io/hostname: ${NODE}
      restartPolicy: Never
      securityContext: {}
    EOF
  • Exec into the pod: kubectl exec -it debug-${NODE} -- sh

Choose an interface on which to capture packets

Running ip addr show will list the interfaces on the node itself. eth0 is typically the private interface, eth1 is the public interface (if the node has one) and the cali... interfaces are for the non-host-network pods running on that node. Running ip route show | grep cali... will show you the IP of the pod that is using that interface.

In this example, we will capture packets coming in to the public Loadbalancer VIP 169.1.2.3 on port 80 on an OpenShift version 4 cluster. The first thing to do is to look at the worker nodes that the two ibm-system/ibm-cloud-provider-ip-<VIP>-... pods are on. One of these is the active Loadbalancer worker node, and the other is the passive (waiting to take over if the active goes down). 

You will use one of the methods above to access these hosts and run ip addr show to see which node's eth1 (public) interface has the VIP. That is the node on which we need to run tcpdump. If we wanted to instead capture packets going to/from a specific pod, we would find which cali... interface was associated with that pod and use that interface instead.

Capture the packets

On the worker node that the VIP is on, run: tcpdump -lnei eth1 host 169.1.2.3 and port 80 -C 100 -W 5 -w /tmp/pub_LB_169_1_2_3_port_80.pcap

There are many good tutorials on tcpdump parameters, so I'm not going to get into all of that, but to summarize this command:

  • Captures packets only on interface eth1 that are to/from 169.1.2.3 on port 80
  • Puts the packets in binary format into up to five files with size 100MB each, titled:
  • pub_LB_169_1_2_3_port_80.pcap0
  • pub_LB_169_1_2_3_port_80.pcap1
  • ...

If tcpdump fills up all five files with 100MB of data each, it will start overwriting the oldest data (in this case, in .pcap0). So, you will always have the most recent 500MB of packets captured.

When you have captured what you think you need, use <ctrl>-c to end the tcpdump. Then use ls -ltr /tmp to see what pcap files were captured. It is important NOT to exit out of this pod (especially if you are using oc debug) until AFTER you have downloaded these pcap files, otherwise the files will be lost. So for the next step, leave this prompt open, and open a new command prompt with access to your cluster to complete the next step.

Download the pcap files

You can always just run tcpdump without the -w parameter so the packets are shown in text format on the screen. However, if you want to analyze a lot of data or data over a long period of time, you will want to use -w to put the data in a file as we did above and then download that file to your laptop and use something like Wireshark to analyze it. 

To get those files off of the worker node, you will first need to find the pod you created (or that was created by oc debug) using: kubectl get pods -o wide. Then you can use kubectl cp default/<debug-pod>:/tmp/pub_LB_169_1_2_3_port_80.pcap0 ./pub_LB_169_1_2_3_port_80.pcap0 to download the file (and any other pcap# files) to your laptop.

Once you have successfully downloaded the files, you can exit out of your original prompt where you captured the pcap files. If you created the debug pod manually, you will need to run a kubectl delete pod... command to clean it up. The oc debug ... command takes care of this for you.

Analyze the pcap files

I use Wireshark to analyse pcap files. If they are small and simple, you can also just use tcpdump -r <pcap_file> to turn the binary format into readable text. 

Stay tuned for more

In future posts, I will show some techniques I use to make sense of these pcap files, and I'll also show how to get packet captures for traffic going directly to/from a specific pod.

Be the first to hear about news, product updates, and innovation from IBM Cloud