Taking advantage of networking large-send large-receive
This article is a working draft with many key and knowledge'able contributors, including Tom Phillips, Bernard King-Smith, Fabrice Moyen, Brian King, Xiaohan Qin, Tom Falcon, Nathan Fontenot, and others. My job (Bill Buros) is consolidating the various experiments, approaches, and recommendations into the article, while learning more about this realm. Errors are possible as we continue the review/update process, most of the errors transcribed here are most likely due to my limited current knowledge, so please: errors can and should be attributed to me, and we'll work to get the article fully reviewed, vetted, and ready for the many scenarios possible.
For news, updates, and discussions, see the forums topic entry.
The goal: Linux customers, Linux implementations, should get equal bandwidth and through-put in most of scenarios detailed below.
Topics:
Introduction
Here in this article we'll work to clarify some lingering questions and confusion around the processes for enabling Linux LPARs for what's called "large-send" and "large-receive" for networking traffic between LPARs on a single system and across LPARs on separate systems.
Much of the confusion, and varying performance results, are due to the surprising number of permutations that are possible. Between the Linux versions, AIX versions, whether or not the communication traffic is within a single system or across systems, tuning parameters, and a big one is the default overrides that the pieces can do implicitly, leading to unexpected performance drops.
There are three flavors of this mode now.
- The original AIX-only Shared Ethernet Adapter approach. Generally known as AIX TCP large-send offload.
- Some modifications to the Linux operating systems to allow Linux to leverage that AIX model. Generally known as Virtual Ethernet large send offload
- In the future: An improved longer-term platform enabling approach is being developed
Consistent Terminology
"old_large_send"
Used on the ibmveth modprode command.
# rmmod ibmveth # modprobe ibmveth old_large_send=1
Offload parameters
# ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: off udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off rx-vlan-offload: off tx-vlan-offload: off ntuple-filters: off receive-hashing: off tx-nocache-copy: off
"TSO"
tbd
Platform large send/receive
xxx
1. AIX-only Shared Ethernet Adapter (SEA) approach
TCP large send offload
The TCP large send offload option allows the AIX TCP layer to build a TCP message up to 64 KB long. The adapter sends the message in one call down the stack through IP and the Ethernet device driver. For more details see this article.
This all started with PowerVM LPARs running AIX in the partitions, with a special adapter running in the VIOS partition. The Shared Ethernet Adapter (SEA) and VIOS and AIX enhancements enabled a proprietary and unique protocol for setting up and leveraging larger packets between the LPARs.
The AIX setup utilized an AIX-unique TCP 3-way handshake which enabled the discovery of the receiver's capability (can it support this special mode?) and then the specific negotiation done for each connection. These connections could be done between AIX's running in separate LPARs, or with AIX connecting to the VIOS/SEA adapter.
An AIX VETH driver was implemented to be capable of receiving the large packets.
AIX TCP large send offload not available to Linux LPARs
Due to the unique and specific implementation done with AIX, VIOS, and the adapter, the code necessary for Linux support was not general enough to be accepted into the Linux open-source communities.

2. Linux Virtual Ethernet large send offload
Over time, an approach emerged that worked for the Linux operating systems and LPARs, given that some assumptions and system constraints were met. With these constraints satisfied, the Linux LPAR would be enabled for large receive offload (LRO) and large send offload (LSO) if a module parameter was set correctly.
The Linux feature
- Modified the Linux PowerVM virtual Ethernet driver (ibmveth) to support receiving large packets by default. This would allow for enabling LRO on the SEA in a mixed AIX / Linux CEC.
- Added a module parameter to the Linux virtual Ethernet driver (ibmveth) to allow Linux to send large packets.This should only be set if:
- you are sure that all Linux LPARs on the CEC (the whole system) support receiving large packets, and
- you have LSO enabled on the physical NIC backing the SEA.
- If this is not true, the customer may see dropped packets, so this option would be not be recommended
Essentially this mode implements the support without the TCP 3-way negotiation, which means the burden on usage falls on the administrator to be sure *all* of the Linux LPARs are properly enabled.
These features have been implemented in the following Linux on Power operating systems:
Software Requirements
vios 2.2.5.20 is required.
- RHEL 7.2 update: 3.10.0-327.49.2, or later
- RHEL 7.3 update: 3.10.0-514.10.2, or later
- SLES 11 SP4 update: 3.0.101-94.1, or later
- SLES 12 SP1 update: 3.12.69-60.64.29.1, or later
- SLES 12 SP2 update: 4.4.49-92.11.1, or later
- Ubuntu 16.04.1 update: 4.4.0-63.84, or later
- Ubuntu 16.04.2 update: 4.8.0-38.41, or later
In addition to the Linux OS enabling, the PowerVM system needs the VIOS partition to be loaded with a 2.2.4 level (or newer).
Each connection in the Linux LPARs will need the following instructions..
rmmod ibmveth // unload the veth driver modprobe ibmveth old_large_send=1 // reload the veth driver ethtool -K eth0 tso on // turn on largesend on veth
surviving a reboot.. add details.. Tom P
showing large send large receive packet numbers.. ethtool -S
rx packets may not be counted - Tom F
3. Performance scenarios - MTU sizes
As a general rule, the larger the MTU the higher the bandwidth up to the limit of the "network". When discussing SEA, there are MTU limitations based on the MTU of the Ethernet network. With veth, the MTU is limited by IPv4 which has a maximum packet size of 64K. Using larger MTU sizes you minimize the number of times a packet traverses the TCP stack, and maximize bandwidth.
In many of our experiments, it appears that SLES is using large_send as the buffer size, but not in AIX.
4. Performance scenarios - No VIO Server
When using the virtual ethernet network provided by the virtual ethernet adapters within the single system, network traffic can be tuned between AIX and Linux LPARs.
The permutations of operating systems "should" be about the same in performance characteristics, for example
- AIX to AIX
- AIX to SLES
- SLES to AIX
- SLES to SLES
5. Platform large-send support
In the future, the platform large send support will have the following requirements in the PowerVM based system:
- FW 840
- VIOS 2.2.5.20
- Linux OS support - RHEL 6.8, RHEL 7.2 BE and LE, SLES 11SP4, SLES 12 SP1, Ubuntu 14.04.4, Ubuntu 16.04
- AIX 7.2 TL1
- IBM i 7.1 TR11, IBM i 7.2 TR3
6. Scenarios that should work
AIX and SLES comparisons
In this scenario, iperf was used. After tuning up the pieces the iperf AIX<->SLES iperf numbers were close to AIX<->AIX.
We did NOT use mtu9000. With MTU1500 AIX and SLES performance was close.
First thing is to ensure old_large_send is enabled/yes. This is a requirement.
THEN ensure tso is enabled for the correct adapter.
The AIX/VIOS level is important: IOSLEVEL 2.2.4.10 (IV72825 Large receive packets are dropped by Linux Client LPAR)
In addition to large_send Large receive has to be setup on the VIOS SEA/10Gb-adapter or else receive speed takes a big hit.
entstat -d SEA will report Aggregated/Offloaded packets giving some indication the SEA isworking or not under the REAL adapter section.
Ex. entstat -d <SEA> snippet for a CT3 adapter:
Transmit TCP Segmentation Offload: Enabled <<<<<<<<<<<<<<<<
TCP Segmentation Offload Packets Transmitted: 319632
TCP Segmentation Offload Maximum Packet Size: 65226
Receive TCP Segment Aggregation: Enabled <<<<<<<<<<<<<<<
TCP Packets Aggregated into Large Packets: 6043750
TCP Payload Bytes Aggregated into Large Packets: 8730523959
TCP Segment Aggregation Large Packets Created: 428351
TCP Segment Aggregation Average Packets Aggregated: 14
TCP Segment Aggregation Maximum Packets Aggregated: 16
AIX LPAR on a Power 795 system connected to a SLES 11 SP4 LPAR on a Power8 system
Details tbd
SLES 11 SP4 to SLES 11 SP4 on the same PowerVM system
Details tbd
7. Hints on measuring your performance
Tools.
Teams have generally used two different tools for measuring the performance.
"netperf" and "iperf"
Details being gathered on those tools.
iperf 2.0.5
Observations.
