February 25, 2021 By Kean Kuiper
Saju Mathew
Rei Odaira
4 min read

In the third part of this five-part series, we will explain how we examined data structures in the Linux kernel to diagnose the packet loss issue described in Part 1.

We will show how we used SystemTap to probe the status of queues and then how we experimented with different configurations of the queues to observe how they affected the packet loss. This is part of the series of blogs that is intended for network administrators and developers who are interested in how to diagnose packet loss in the Linux network virtualization layers.

Revisiting the source code

The following is a simplified version of the tap_handle_frame() function in the Linux version 4.15.0. Please refer to Part 2 for the details. This is the function where packets were dropped in our case. The execution can jump to the drop label in line 16 from lines 5, 9, 12, and 14. In Part 2, by using SystemTap, we confirmed that the execution could never jump from lines 9 or 12:

 1: rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 2: {
    ...
 3:         q = tap_get_queue(tap, skb);
    ...
 4:         if (__skb_array_full(&q->skb_array))
 5:                 goto drop;
    ...
 6:         if (netif_needs_gso(skb, features)) {
 7:                 struct sk_buff *segs = __skb_gso_segment(skb, features, false);
 8:                 if (IS_ERR(segs))
 9:                         goto drop;
    ...
10:         } else {
    ...
11:                 if (skb_checksum_help(skb))
12:                         goto drop;
13:                 if (skb_array_produce(&q->skb_array, skb)
14:                         goto drop;
15:         }
    ...
16: drop:
17:         if (tap->count_rx_dropped)
18:                 tap->count_rx_dropped(tap);
    ...
19: }

Checking the status of the macvtap queues

The only remaining possibility was either at lines 5 or 14. Both conditions become true when the macvtap queue is full. As described in Part 2, there are three macvtap queues in our system. In line 3, tap_handle_frame() selects one of the macvtap queues based on the hashed value of the received packet. Line 4 performs an early check on whether the queue is full or not. Later, in line 13, it inserts the packets into the queue. In both cases, if it finds the queue full, it drops the packet.

SystemTap allows users to check not only whether a kernel function is called, but also the state of a kernel data structure. We wrote a script to dump the status of all of the three macvtap queues every time macvtap_count_rx_dropped() was called — that is, every time a packet was dropped. We omit the execution results of the script, but they indicated that even when a packet was dropped, none of the queues were full. They contained, at most, one packet. This observation contradicts our analysis. Why were packets dropped at non-full queues?

A caveat is that our SystemTap script did not acquire any necessary lock to inspect the queue status. Because multiple host ksoftirqd threads can access the mavtap queues, a correct protocol to access one of the queues is to first acquire its lock. However, it would be prohibitively tedious to write such code in SystemTap, so we did not go down the path. As a result, there was always a chance of race condition where the dumped queue status might not have been a consistent snapshot.

To completely understand what was going on, we required an approach from another angle.

Changing the number of queues

In Figure 1, we present the Linux network virtualization layers that we explained in Part 2. Since we had many different paths in the layers, we figured it might help to simplify the setup to understand the problem further:

Figure 1

One way to simplify the environment was to reduce the number of VF queues feeding a macvtap device. Reducing from four to a single VF queue simplified the topology significantly, as shown in Figure 2.

Some quick benchmarks at this state provided interesting results, and the observed macvtap loss has disappeared. This was interesting and worthy of further experimentation. Reducing the four VF queues to one queue led to no observed packet loss. So, what about some other scenarios?

We modified the environment with number of producer VF queues from 1 to 2 and 4. We then varied the number of consumer virtqueues — and, hence, the number of macvtap queues — to conduct a series of further experiments, as shown in Figure 3. Subsequently, we discovered that no packet loss occurred when the number of virtqueues were multiples of the number of VF queues:

Figure 2

Figure 3

We can surmise from the data that when multiple VF queues distribute into the same macvtap queue, loss can be observed. An example is indicated with the red arrows leading into the first macvtap queue in Figure 1. This, combined with the previous observation using SystemTap, strongly indicated a multi-threading problem in macvtap, specifically the producers’ writing into the queues.

Summary

In this post, we have examined the packet loss — first by instrumenting the macvtap queues using SystemTap and then by changing the number of VF queues, macvtap queues and virtqueues. Based on our observation, we suspected that there was a concurrency bug in the macvtap driver of Linux. In the next post, we will explain the root cause of the packet loss and will present how we confirmed our hypothesis, using SystemTap.

Read more

Was this article helpful?
YesNo

More from Cloud

How digital solutions increase efficiency in warehouse management

3 min read - In the evolving landscape of modern business, the significance of robust maintenance, repair and operations (MRO) systems cannot be overstated. Efficient warehouse management helps businesses to operate seamlessly, ensure precision and drive productivity to new heights. In our increasingly digital world, bar coding stands out as a cornerstone technology, revolutionizing warehouses by enabling meticulous data tracking and streamlined workflows. With this knowledge, A3J Group is focused on using IBM® Maximo® Application Suite and the Red Hat® Marketplace to help bring…

How fintechs are helping banks accelerate innovation while navigating global regulations

4 min read - Financial institutions are partnering with technology firms—from cloud providers to fintechs—to adopt innovations that help them stay competitive, remain agile and improve the customer experience. However, the biggest hurdle to adopting new technologies is security and regulatory compliance. While third and fourth parties have the potential to introduce risk, they can also be the solution. As enterprises undergo their modernization journeys, fintechs are redefining digital transformation in ways that have never been seen before. This includes using hybrid cloud and…

IBM Cloud expands its VPC operations in Dallas, Texas

3 min read - Everything is bigger in Texas—including the IBM Cloud® Network footprint. Today, IBM Cloud opened its 10th data center in Dallas, Texas, in support of their virtual private cloud (VPC) operations. DAL14, the new addition, is the fourth availability zone in the IBM Cloud area of Dallas, Texas. It complements the existing setup, which includes two network points of presence (PoPs), one federal data center, and one single-zone region (SZR). The facility is designed to help customers use technology such as…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters