IBM Support

The Power8 Platform Largesend Segmentation Offload (PLSO) feature in AIX virtual network environment

Question & Answer


Question

What is the Platform Largesend Segmentation Offload (PLSO) feature and how does it compare to the native AIX Largesend functionality?

Cause

The native AIX Largesend feature required a special end-to-end negotiation procedure between Virtual Ethernet Adapters (VEA) per TCP socket that was implemented with modified SYN/SYN-ACK packets during the connection establishment phase.
IBM Power Systems can also host Logical Partitions (LPAR) with IBMi or Linux Operating Systems.  The native AIX largesend negotiation procedure had not been implemented in these Operating Systems.  This lead to the situation that network throughput was limited between VEA's in LPAR's with different operating systems due to the fact that the native AIX largesend feature did not work.
To overcome this interoperability problem, the Power Hypervisor assisted Platform Largsend Segmentation Offload (PLSO) feature has been introduced with the Power8 platform starting with firmware level FW840.10.

Answer

What are the new principles behind PLSO?

The PLSO capability is negotiated between the Virtual Ethernet Adapter device driver and the Power Hypervisor during the device driver initialization phase when the device is opened.  If a VEA has successfully negotiated PLSO, it can send Platform Largesend packets independent of the receivers capability.  It is now the responsibility of the Power Hypervisor to either deliver the Platform Largesend packet straight to a PLSO enabled receiver or to segment the Platform Largesend packet into small standard Ethernet frames.
The Power Hypervisor will take free resources from the CPU pool for segmenting Platform Largsend packets for not PLSO capable receiving VEA's.

What are the requirements for PLSO?

1.  The IBM Power System needs to be Power8 or newer. PLSO was first introduced in the system firmware level FW840.10
2. The PLSO feature was introduced in the following operating system levels:
  - AIX 7100-04-03 and AIX 7200-01-00
  - IBM i 7.1 TR10 or IBM i 7.2 TR3
  - POWER Linux
       RedHat 6.8, 7.2
       SLES 11SP4, 12 SP1
       Ubuntu 14.04.4, 15.10, 16.04
3. The PLSO feature was introduced in the VIO Server level 2.2.5.0

How  can I find out if a Virtual Ethernet Adapter did successfully negotiate Platform Largesend with the Power Hypervisor?

There are two possibilities to see if PLSO was negotiated successfully:
1. Platform Largesend network adapter statistics
-> get VEA statistics with  the entstat command:
# entstat -d ent0 | grep "Large Send"
  Platform Large Send Offload: Enabled
  Platform Large Send Packets Transmitted: 0
  Total Large Send Packets Transmitted: 145664
  Platform Large Send Packets Dropped: 0

This specific statistics was added for PLSO. If the output of this command is empty, then the PLSO feature is not supported by the VEA device driver. 
"Platform Large Send Offload" shows "Enabled" if PLSO was successfully negotiated with the Power Hypervisor.  A value of "Disabled" shows that either the Power Hypervisor is not PLSO capable (i.e. Power7 system) or PLSO was not activated for some reason.
The "Total Large Send Packets Transmitted" statistics does show the number of native AIX largesend packets sent plus the number of PLSO packets sent.
2.  The ifconfig IP hexadecimal interface flags do show the  negotiated PLSO capability
-> get ifconfig output of the VEA's IP interface
# ifconfig en0
en0: flags=1e084863,814c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
The hexadecimal value for the second part of the flags "814c0" shows the extended flags that contain the PLSO capability:
#define IFO_LARGESEND_PLATFORM  0x00080000    /* platform largesend support */
Doing a bit-wise logical AND operation will show the PLSO flag either "1" or "0".
0x000814c0 AND 0x00080000 = 0x00080000 -> the PLSO flag is set

How can the sending of Platform Largesend packets be enabled for TCP sessions?

The Platform Largesend feature is controlled by the mtu_bypass IP interface option.
Important note:
The mtu_bypass IP interface attribute does only control the sending of Largesend packets. In case Largesend/PLSO was turned off on the IP interface, the PLSO enabled VEA can still receive PLSO packets from other VEA's.
The following values can be set for mtu_bypass:
  • "off" - TCP sessions will neither use the native AIX largesend nor the Platform Largesend feature
  • "on" - TCP sessions will try to negotiate the native AIX largesend feature first and if that failed the PLSO feature will be used if the IP interface has the PLSO flag set.
  • "plso" - TCP sessions will send only Platform Largesend packets
Note:
AIX levels older than 7.2 have mtu_bypass=off by default for IP interfaces on VEA adapters.  Starting with AIX 7.2 mtu_bypass=on by default.
1. Find the VEA adapter
# lsdev -Cc adapter | grep l-lan
ent0   Available  Virtual I/O Ethernet Adapter (l-lan)
2. Make sure that the chksum_offload device attribute is active
# lsattr -El ent0 -a chksum_offload
chksum_offload    yes       Enable Checksum Offload for IPv4 packets True
The chksum_offload feature is by default "yes", but if it has been set to "no" for some reason, it needs to be changed to "yes".
-> in order to change the device attribute on the VEA ent0, the IP interface en0 needs to be detached first.
# ifconfig en0 detach
# chdev -l ent0 -a chksum_offload=yes
# mkdev -l en0
3. Enable largesend
The mtu_bypass IP interface attribute controls the largesend feature. The available values can be checked with:
# lsattr -Rl en0 -a mtu_bypass
on
off
plso
mtu_bypass can be changed dynamically, but it's only active for newly opened TCP sessions
# chdev -l en0 -a mtu_bypass=on
4. Check the IP interface flags
# ifconfig en0
en0: flags=1e084863,814c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1

How can Largesend be enabled on a Shared Ethernet Adapter in a VIO Server?

 
When a SEA device was created from a real Ethernet network adapter card and a VEA(s), the "largesend" feature is activated by default if the underlying Ethernet network adapter card supports checksum offloading and large_send and also have these device attributes enabled.
 
1. Login to the VIO server as padmin user
2. List the Shared Ethernet Devices
$ lsdev | grep Shared
ent3             Available   Shared Ethernet Adapter
3. Show the largesend attribute of the SEA device
$ lsdev -dev ent3 -attr | grep largesend
largesend       1        Enable Hardware Transmit TCP Resegmentation                                        True
4. Enable largesend
$ chdev -dev ent3 -attr largesend=1
The change is dynamic.
 

How does the SEA handle Platform Largesend packets?

In case of the native AIX Largesend feature being used, Largesend is negotiated per TCP session between the Shared Ethernet Adapter (SEA) and the VIO client VEA.  It this case the SEA device will only receive Largesend packets if Largesend was actually enabled on the SEA device.
When sending packets to external network components, largesend enabled Shared Ethernet Adapters (SEA) in VIO servers send the Largesend packets to the network adapter card which is then segmenting the large packets into small packets in hardware. The Largesend packet contains the TCP Maximum Segment Size (MSS) coded into the TCP checksum header field of the Largesend packet.
When the Largesend enabled Shared Ethernet Adapter received such a packet from the trunk VEA, the packet is sent to the real network adapter card that is then segmenting the Largesend packets into small packets according to MSS value that was provided by the sending VEA.
In contrast to this, the Platform Largesend packets are always delivered to the Trunk VEA adapter inside the SEA if the the Trunk VEA had PLSO successfully negotiated with the Power Hypervisor during device activation.  If the "largesend" SEA device attribute has been disabled manually for some reason, then the SEA keeps receiving Platform Largesend packets.
In this case the PLSO packets are NOT handled as Largesend packets in the SEA code but as a simple large packet.  These packets are then fragmented in software by the SEA device.
When packets have the "Don't Fragment" bit (DF) set in the IP header then the packet is dropped and an ICMP error packet "Destination unreachable, fragmentation necessary" is sent back to the sender.
The "Don't Fragment" bit is often set in the IP header i.e. when Path MTU discovery is enabled (AIX network option: tcp_pmtu_discover=1 by default). 
The fragmentation of the PLSO packets in software can cause throughput and load problems on the VIO servers and the dropping of packet with the DF bit set can cause serious communication problems.
This is visible in the SEA adapter statistics:
$ netstat -cdlistats ent3
...
Virtual Side Statistics:
    Packets received: 18468928
    Packets bridged: 18468764
    Packets consumed: 78255
    Packets fragmented: 48
    Packets transmitted: 23998741
    Packets dropped: 0
    Packets filtered(VlanId): 0
Other Statistics:
    Output packets generated: 1519703
    Output packets dropped: 0
    Device output failures: 0
    Memory allocation failures: 0
    ICMP error packets sent: 48
    Non IP packets larger than MTU: 0
    Thread queue overflow packets: 0
...
This problem situation was addressed in APAR IV95160 "SEA IS NOT PLATFORM LARGESEND AWARE" and has been filed as a permanent restriction.
Important note:
Due to this SEA restriction it is necessary to keep the SEA "largesend" device attribute enabled on all SEA devices that can potentially receive PLSO packets!

Known Problems

1. Platform Largesend is disabled in a VEA after Live Partition Mobility (LPM) LPAR move operation from Power7 to a Power8/9 System

When moving a LPAR with Virtual Ethernet Adapters that have a PLSO enabled AIX software level from a Power7 System that is not PLSO capable to a Power8/9 System that is PLSO capable, the Virtual Ethernet Adapters will have PLSO disabled and the entstat output shows "Platform Large Send Offload: Disabled" after the move.
The reason for this is, that the PLSO capability was negotiated during device driver initialization on the Power7 system which is not PLSO capable.  The PLSO capability cannot be changed dynamically with the LPM move.  To activate PLSO on the VEA adapters after the LPM move, the IP interfaces need to be detached and activated again:
# ifconfig en0 detach
# mkdev -l en0
# mkdev -l inet0     //restore static routes

2. Platform Largesend packets dropped and transmit stall after Live Partition Mobility (LPM) LPAR move operation from Power8/9 to a Power7 System and later back to a Power8/9 System

When moving a LPAR with Virtual Ethernet Adapters that have PLSO active from a Power8/9 to a Power7 System that is not PLSO capable, TCP session can still have PLSO enabled, but PLSO packets are not send out because the PLSO was disabled by the device driver on Power7.
When moving the same LPAR in this state back to a Power8/9 System PLSO is still disabled in the device driver and these Platform Largesend packets will cause transmit errors in the VEA device driver and mbuf's  can be leaked. Later the whole transmit buffer pool gets empty due to the leak and the VEA can't send out packets any more. The entstat transmit statistics output of the VEA will show an increasing number of "Packets Dropped" and "No Buffers":
Transmit Statistics:
...
Packets Dropped: 21666
...
Transmit Information
Transmit Buffers
Buffer Size 65536
Buffers 32
History
No Buffers 10209197
The transmit stall sitatution can be cured by initializing the VEA device:
# ifconfig en0 detach
# mkdev -l en0
# mkdev -l inet0      //restore static routes
The following APARs addressed these problems:
   IJ14587: PLSO ENABLED SOCKETS DROP PLATFORM LARGE SEND PACKET AFTER LPM
   IJ14586: MBUF MEMORY LEAK IN VIOENTDD
3. New SEA attribute plso_bridge=yes packet corruption problem.
VIOS 2.2.6.3x introduced the SEA "plso_bridge" attribute that is enabled by default.  On a Power8/9 System, if a SEA has "large_receive" enabled and with "plso_bridge=yes" the SEA sends aggregated large_receive packets with the Platform Largesend method to client VEAs.  If the receiving client VEA is not PLSO enabled, the Power Hypervisor will segment this packet into MSS-sized packets.  Due to a problem with segmenting padded TCP frames this will sometimes result in modified data packets.
During the segmentation process, the Power Hypervisor calculated new TCP checksums for the segments.  That means that the receiver can not notice the modification.  As a result encrypted TCP sessions will experience hang/timeout/abbort situations because the encrypted data stream can not be decrypted due to the packet corruption. For unencrypted TCP sessions the situation is much worse, because the application will receive corrupted data.
The VEA adapter statistics inside the SEA device statistics will show an increasing number of "Platform Large Send Packets Transmitted" when sending large_receive packets with the plso_bridge method, which would not be the case for the original "large_receive" behavior:
$ netstat -cdlistats ent3 
...
  Platform Large Send Offload: Enabled
  Platform Large Send Packets Transmitted: xxxxxx
  Total Large Send Packets Transmitted: yyyyyy
  Platform Large Send Packets Dropped: 0
...
This Problem has been addressed with APAR IJ12143 "TCP SESSION HANG/TIMEOUT OR POSSIBLE UNDETECTED DATA CORRUPTION"
Woraround:
- For SEAs that only serve AIX clients, "plso_bridge" can be disabled. The pre VIOS 2.2.6.3x SEA large_receive behavior will be restored:
$ chdev -dev ent3 -attr plso_bridge=no
The change is dynamic
- For SEAs that serve IBMi and Linux clients, "large_receive" can be disabled:
$ chdev -dev ent3 -attr large_receive=no
The change is dynamic

SUPPORT:

If additional assistance is required after completing all of the instructions provided in this document, please follow the step-by-step instructions below to contact IBM to open a case for software under warranty or with an active and valid support contract.  The technical support specialist assigned to your case will confirm that you have completed these steps.

a.  Document and/or take screen shots of all symptoms, errors, and/or messages that might have occurred

b.  Capture any logs or data relevant to the situation.

c.  Contact IBM to open a case:

   -For electronic support, please visit the IBM Support Community:
     https://www.ibm.com/mysupport
   -If you require telephone support, please visit the web page:
      https://www.ibm.com/planetwide/

d.  Provide a good description of your issue and reference this technote

e.  Upload all of the details and data to your case

   -You can attach files to your case in the IBM Support Community
   -Or Upload data to IBM testcase server analysis:

    http://www.ibm.com/support/docview.wss?uid=ibm10733581

f.  Click here to submit feedback for this document.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
20 October 2021

UID

ibm10885620