March 5, 2021 By Kean Kuiper
Saju Mathew
Rei Odaira
4 min read

In this post, we will demonstrate a solution to the concurrency bug described in Part 4 by patching a running kernel.

This is the final post of our blog series that is intended for network administrators and developers who are interested in how to diagnose packet loss in the Linux network virtualization layers.

Solving the problem at the source code level

Prior work narrowed the problem to the statement highlighted here, which performs an unsafe early check for an already-full queue:

rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
{
...
        if (__skb_array_full(&q->skb_array))
                goto drop;
...
                if (skb_array_produce(&q->skb_array, skb))
                        goto drop;
...
}

We discovered that in more recent kernels, this check had been removed entirely due to concerns about correctness in situations where the queue size changes. Since the queue full condition is an edge case with respect to performance, we chose the same approach for this repair.

Dynamic Linux kernel bug repair

In the environment under study, rebooting a machine to load a modified kernel has impacts to the rest of the cluster. While this can be done, it is less intrusive to patch the kernel in place.

Fortunately, our kernel supports Kernel Live Patching. This interface can be used to safely apply patches built by other means. We used kpatch to generate a suitable patch.

Building a patch with kpatch

At a high level, kpatch builds two kernels and generates a patch based on differences in the executables produced. The first kernel is from a baseline source, which should match the running kernel. A source code patch is then applied to generate a modified kernel to compare.

To avoid unnecessary updates to any symbolic information influenced by line numbers, we created a patch which simply commented out the early queue full test:

--- from/drivers/net/tap.c
+++ to/drivers/net/tap.c
@@ -330,9 +330,9 @@
	if (!q)
		return RX_HANDLER_PASS;

-       if (__skb_array_full(&q->skb_array))
+/*     if (__skb_array_full(&q->skb_array))
		goto drop;
-
+*/
	skb_push(skb, ETH_HLEN);

We started with the source originally used to build the kernel we were patching, used the same GCC version and ensured an exact kernel version match of the target kernel with the LOCALVERSION configuration option.

kpatch-build produced the following output:

Building original source
Building patched source
Extracting new and modified ELF sections
tap.o: changed function: tap_handle_frame
Patched objects: drivers/net/tap.ko
Building patch module: livepatch-no-tap-early-queue-test.ko

A patch module contains one or more objects, each having one or more functions to patch. In our case the module is tap.ko, and we are replacing tap_handle_frame().

Kernel Live Patch support

Live patching leverages ftrace support, which allows patching the first instruction of every function. In a patched function, this first instruction of the original is altered to immediately call klp_ftrace_handler().

After looking up the replacement, the call stack is rearranged to simulate a call directly to the new function. The patched function later returns directly to the caller of the unpatched version.

This instruction patch approach handles correctness, but more is required to guarantee coherency of the patch process at runtime. Once function(s) are patched, new executions begin using them immediately. However, the patch process is only complete when all threads in the system are guaranteed to use replacement functions the next time they are called.

Patch completion is tracked per thread by initially marking each TIF_PATCH_PENDING once all functions are patched. When threads return from kernel execution, they reset TIF_PATCH_PENDING since they could not be in a patched kernel function at that point. To handle blocked or idle threads, the kernel periodically scans them to see if they are TIF_PATCH_PENDING. If so, their call stacks are scanned to check for unpatched functions, clearing TIF_PATCH_PENDING if none are found.

These two mechanisms work together to confirm patch completion, usually in a matter of seconds:

loading patch module: livepatch-no-tap-early-queue-test.ko
waiting (up to 15 seconds) for patch transition to complete...
transition complete (3 seconds)

Confirming the solution

After developing and applying the dynamic patch to an environment, we set up a subsequent test to determine if packet loss was resolved. This test is the Netperf TCP_RR test described in Part 1 of this blog series. The results of the test are presented in Figure 1. The orange and the dark blue bars represent the histograms of the communication latencies without and with our patch, respectively.  With the patch in place, the test ran successfully with zero packet drops. Note that toward the right end of the figure, there are no 200 ms+ latency tails with the patch:

Conclusion

In this blog series, we shared our experience of diagnosing packet loss in the infrastructure of IBM Cloud. The root cause of the packet loss in the discussed case was a concurrency bug in the Linux macvtap driver.

To narrow down the scope of the analysis and identify the root cause, we leveraged some useful tools and methodologies. Using SystemTap, we instrumented the running host Linux kernel to identify control paths and to probe data structures. We applied an intrusive analysis that varied the number of queues and observed the effect of this variation on the packet loss. It is worth mentioning that a careful and thorough source code analysis was a critical part of our diagnosis. Finally, we utilized the kpatch framework to apply a hot patch to the running Linux kernel. We have already implemented our solution in most regions.

Packet loss is one of the most critical problems in a network. However, its diagnosis becomes difficult in a cloud environment, where the network stack comprises multiple virtualization layers connected by queues. We hope that this blog series will help all those who are interested in diagnosing packet loss in network virtualization layers.

Read more

Was this article helpful?
YesNo

More from Cloud

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside ibm.com), the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

Optimize observability with IBM Cloud Logs to help improve infrastructure and app performance

5 min read - There is a dilemma facing infrastructure and app performance—as workloads generate an expanding amount of observability data, it puts increased pressure on collection tool abilities to process it all. The resulting data stress becomes expensive to manage and makes it harder to obtain actionable insights from the data itself, making it harder to have fast, effective, and cost-efficient performance management. A recent IDC study found that 57% of large enterprises are either collecting too much or too little observability data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters