Network interface settings

This topic lists the changes that were made to all the network interfaces used for the network measurements.

This includes all the interfaces used in the KVM guests as well as the KVM guest interfaces in the KVM hosts (that is, Macvtap# or vnet#) as well as the KVM host OSA interfaces including any configured software bridges for settings that apply.

Maximum Transmission Unit

“Maximum transmission unit (MTU) is the maximum size (in bytes) of one packet of data that can be transferred in a network. All hops (a portion of a signal's journey from source to receiver) that are a part of the communication must be able to receive and transmit at the same MTU. However, if one of the hops in the network path is using a lower MTU size than the originating host, the package is retransmitted in a smaller size.

The default MTU size for Ethernet devices is 1500 bytes. In a distributed environment it is necessary to configure all Linux® servers and switches/routers to communicate at the same MTU size to take advantage of it. In a Linux on System z® environment, it is a relatively low effort to set up all internal communication using 8992 bytes of MTU size.

By adjusting the MTU sizes, you can improve the performance of your application and database servers.

For example, typically, the larger the packet size, the better the performance because fewer packets are needed to communicate between the driver and the database. Fewer packets means fewer network round trips to and from the application. Fewer packets also require less disassembly and reassembly, and ultimately, use less CPU.¹”1

The testing performed here compares a normal and large MTU size. The normal or default MTU size typically used is 1500 bytes and for a larger MTU size 9000 bytes tends to be the common choice. While 9000 bytes is typically used as a large MTU size, it also spans more than two physical 4K pages of memory. The larger MTU size tested here was adjusted down to fit exactly in to 2 4K pages which equals 8192 bytes.

The degree of improvement varies depending on whether the KVM guests are running on a single KVM host or on separate KVM hosts.

Figure 1. Throughput, Latency and CPU efficiency using a large MTU size vs the default MTU size with KVM guests running on a single KVM host LPAR

Figure 1 shows the improvements when a large MTU size is used between KVM guests running on the same KVM host. In this configuration, improvements are seen when the datagram or payload size is larger than the default MTU size. Three workload test types have payload sizes of 30K bytes. Each of these workload test types obtain substantial (up to 120%) improvements in throughput, latency and CPU efficiency when the larger MTU size is used.

Figure 2. Throughput, latency and CPU consumption using normal and large MTU sizes with KVM guests running on separate LPARs

In Figure 2 we can see the improvements of a larger MTU size when KVM guests are running on separate KVM hosts. In this configuration, the maximum throughput is limited to the speed of the physical network fabric. Here the larger MTU size provides 5-20% improvement in throughput and transaction time in all but 2 workload test types. From a CPU perspective the efficiency improved by 25-75% across all test types.

These results clearly indicate that using a larger MTU size provides improvements in almost all use case scenarios and is the recommendation to use to configure your environments. However, to take advantage of the potential gains, the entire network path must be configured to use the larger MTU size. This may not always be possible based on external factors such as hardware limitations or restrictions enforced by other shared network infrastructure systems and components.

The MTU size can be set dynamically using the command:

[root@kvm(host|guest) ~] # ip link set dev {interface} mtu NEWMTUSIZE

However this method of setting the MTU size is not persistent across reboots.

To make the change persistent it is recommended to include the MTU= paremeter in the “ifcfg-{interface}” file for the target interface.

MTU=8192

After making this change, the interface must be restarted for the change to take effect. To stop and start the interface use these commands:

[root@kvm(host|guest) ~] # ifdown {interface}; ifup {interface}

Note: The MTU size will typically default to the smallest MTU size in the transmission path between source and target endpoints. When running KVM guests and communicating external to host, the host may or may not be an active participant of the communication path MTU size determination, depending on the network model used by the guest.

Warning: When using MacVTap which uses a direct connection to a host interface, the host is not an active participant in the transmission path's MTU size determination. In this case, if the MTU size in the host is smaller than the MTU size of the KVM guest and the KVM guest attempts to send packets that are larger than the MTU size in the host, those packets will stall in the host. The host will not fragment the larger packets down to fit in to its MTU and the connection will hang until it times out. It is therefore necessary to set the host MTU size equal to or greater than the MTU size set in the KVM guests.

buffer_count

The buffer count parameter is a network parameter for QDIO devices for [Linux on IBM Z]. This parameter allows Linux servers to receive more network packets in order to increase throughput. The default buffer count value for [Linux on IBM Z] is 16 . This parameter must be defined for each network device and allocates 1 MB of memory. A buffer count of 128 leads to 8 MB of memory consumption.
You can check the actual buffer count by using the lsqeth -p <interface> command.
The configuration of the buffer count must be done with the virtual interface in an offline state.²

It is recommended to add this setting in “ifcfg-{interface}” file for the target interface. The OPTIONS= parameter needs to be added or edited to include “buffer_count=128”:

OPTIONS=”buffer_count=128”

The interface must be restarted for the change to take effect.

Network checksumming

The OSA network cards support two checksumming options. The first option is to use software checksumming and the second option is to perform checksumming in the hardware (on the OSA card).

To reduce the load from the CPUs and perform checksumming faster, the checksumming setting was changed from the default of software checksumming to hardware checksumming.

The option is changed in “ifcfg-{interface}” file for the target interface. The OPTIONS= parameter should be added or edited to include “checksumming=hw_checksumming”.

 [root@kvm(host|guest) ~] # OPTIONS=”checksumming=hw_checksumming”

After making this change, the interface must be restarted for the change to take effect.

Transmit (TX) Queue length

There is a setting available to adjust the size of the queue between the kernel network subsystems and the driver for network interface card. Like with other buffers, it is recommended to appropriate set the queue size to prevent losses resulting from buffer overflows. Therefore careful tuning is required to ensure that the sizes of the queues are optimal for your network connection.

These settings are especially important for TCP as losses on local queues will cause TCP to fall into congestion control – which will limit the TCP sending rates. Meanwhile, full queues will cause packet losses when transporting udp packets.

There are two queues to consider, the netdev_backlog (see net.core.netdev_max_backlog in Network stack settings) which relates to the receive queue size and txqueuelen which determines the transmit queue size.³

The default transmit queue size (txqueuelen setting) for a network device on IBM Z is 1000. This value is adequate for Gigabit network devices. However, for devices with 10 Gbs or greater, the txqueuelen setting should be increased to avoid overflows that drop packets.

Similarly, choosing a value that is too large can cause added overhead resulting in higher network latencies.

To query the transmit queue size of the device on your system use either of the following:

[root@kvm(host|guest) ~] # ifconfig <interface-name> 
eth0 Link encap:Ethernet HWaddr e4:1f:13:ba:c7:04 
     inet addr:9.53.92.168 Bcast:9.53.92.255 Mask:255.255.255.0 
     inet6 addr: fe80::e61f:13ff:feba:c704/64 Scope:Link 
     UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 
     RX packets:165275988 errors:0 dropped:0 overruns:0 frame:0 
     TX packets:169557966 errors:0 dropped:0 overruns:0 carrier:0 
     collisions:0 txqueuelen:1000 
     RX bytes:87361147022 (87.3 GB) TX bytes:117748544954 (117.7 GB)

The transmit queue size is highlighted.

[root@kvm(host|guest) ~] # ip link show dev <interface-name> 
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group 
default qlen 1000  
link/ether e4:1f:13:ba:c7:04 brd ff:ff:ff:ff:ff:ff

The reported transmit queue size value is highlighted in both examples.

To change the default txqueuelen value, either of the following will work:

[root@kvm(host|guest) ~] # ifconfig <interface-name> txqueuelen <new-value>

[root@kvm(host|guest) ~]# ip link set txqueuelen <new-value> dev <interface-name>

To determine the optimal setting for your environment and workload some experimentation may be required. The txqueuelen value that produced the best results for our tests was 2500. This value is good starting point.

¹ Set up Linux on IBM® System z for Production

² Set up Linux on IBM System z for Production

³ http://datatag.web.cern.ch/datatag/howto/tcp.html