IBM Support

IBM AIX: Troubleshooting "ECH_PORT_OUT_OF_SYN" errors in IEEE 802.3ad Link Aggregation/LACP Etherchannel.

Troubleshooting


Problem

In an established IEEE 802.3ad Link Aggregation/LACP EtherChannel configuration. There are times when the system shows error: "ECH_PORT_OUT_OF_SYN" in errpt. During the error event, a network connectivity issue may arise. 

Symptom

May experience temporary Loss of network connectivity or it can prolong until it is fixed. 
The network adapter may report link down status.
Failover of EtherChannel or Shared Ethernet Adapter (SEA) 

Cause

Port out of sync issue will usually occur when the LACPDU packets have not been properly exchanged between the adapters and the switches. 

Environment

OS: AIX / VIOS
Config: IEEE 802.3ad Link Aggregation

 

Diagnosing The Problem

Errpt logs errors with LABEL: ECH_PORT_OUT_OF_SYN 

Under 802.3ad Port Statistics in netstat -v or entstat -d entX output of affected EtherChannel adapter. 

Output of an OUT_OF_SYNC situation. 

IEEE 802.3ad Port Statistics: 

        (AIX/VIOS Side)
        Actor System Priority: 0x8000                           
        Actor System: 98-BE-94-78-CD-C4
        Actor Operational Key: 0xBEEF
        Actor Port Priority: 0x0080
        Actor Port: 0x0002
        Actor State:
                LACP activity: Active
                LACP timeout: Long
                Aggregation: Aggregatable
                Synchronization: IN_SYNC
                Collecting: Disabled
                Distributing: Disabled
                Defaulted: False
                Expired: True      (The Link Aggregation is in expired mode, since no LACPDU packets being received)

        (Switch Side)
        Partner System Priority: 0x8000                       
        Partner System: 2C-0B-E9-D9-B5-00
        Partner Operational Key: 0x0014
        Partner Port Priority: 0x8000
        Partner Port: 0x0310
        Partner State:
                LACP activity: Active
                LACP timeout: Long
                Aggregation: Aggregatable
                Synchronization: OUT_OF_SYNC        (Partner Out of Sync with the Actor)
                Collecting: Enabled
                Distributing: Enabled
                Defaulted: False
                Expired: False
        Received LACPDUs: 1121      (The received LACPDUs is very less when compared to Transmitted LACPDUs)
        Transmitted LACPDUs: 2156
        Received marker PDUs: 0
        Transmitted marker PDUs: 0
        Received marker response PDUs: 0
        Transmitted marker response PDUs: 0
        Received unknown PDUs: 0
        Received illegal PDUs: 0
Things to check: 

NOTE: Make sure that there is a working physical link. Before we debug any further.

a.) Verify the Received LACPDUs and Transmitted LACPDUs counters are incrementing.
      - This will let us know which side is not properly sending out the LACPDU packets. 
b.) The Partner Operational Key should be the same for all the underlying adapters per EtherChannel.
       - If one of them is different, then the cable is not connected to the right aggregated switch port.

c.) At Actor Side (AIX/VIOS side) Expired: True 
      - 
This says the adapters have been waiting to receive the LACPDU packets but has not received any. Thus expiring.
If the configuration seems okay. Taking a network trace (tcpdump/iptrace) at the time of issue can help us understand whats happening.

Resolving The Problem

A.) LACP Related
  1. Check adapters are connected to the right LACP aggregated switch port.
  2. If two different network switches are used, VPC (Virtual Port Channel) should be configured between the switches.

B.) Non LACP Related 
  1. Check for a loose/defective cable or connection. If a switch or another system is directly attached to the Ethernet adapter, verify it is powered up, configured, and functioning correctly.
  2. Update Adapter Microcode to the latest version available. 


Possible hitting APARS:
IV97588: ETHERCHANNEL LINK PORT IN LACP MODE MAY NOT RECOVER APPLIES TO AIX 7200-01
link: https://www-01.ibm.com/support/docview.wss?uid=isg1IV97588

IV95904: NETWORK SWITCH MAY SEND OVER BROKEN LINK IN ETHERCHANNEL APPLIES TO AIX 7200-00
link: https://www-01.ibm.com/support/docview.wss?uid=isg1IV95904

IJ01503: "DEFAULTED" BIT NOT MANAGED CORRECTLY ON 802.3AD ETHER CHANNEL
link: https://www-01.ibm.com/support/docview.wss?uid=isg1IJ01503

SUPPORT

If additional assistance is required after completing all of the instructions provided in this document, please follow the step-by-step instructions below to contact IBM to open a case for software under warranty or with an active and valid support contract.  The technical support specialist assigned to your case will confirm that you have completed these steps.

1.  Document and/or take screenshots of all symptoms, errors, and/or messages that might have occurred. 

2.  Capture any logs or data relevant to the situation.

3.  Contact IBM to open a case:

   -For electronic support, please visit the IBM Support Community:
     https://www.ibm.com/mysupport
   -If you require telephone support, please visit the web page:
      https://www.ibm.com/planetwide/

4.  Provide a good description of your issue, and reference this Technote, and any issues you had with the instructions.

5.  Collect the system snap and upload all of the details and data for your case.

To collect a complete snap of your system information:

5.1) Remove previously gathered data

   # snap -r 

5.2) Copy related files from #1 and #2 to the snap data directory

   # mkdir -p /tmp/ibmsupt/testcase
   # cp <logs, screenshots, etc> /tmp/ibmsupt/testcase

5.3) Run the snap command with one of the following options to collect all info.

     * If you have already engaged with a support engineer, use the flags specified by your support team.
   # snap -aZc (Omits system dump data)   
     OR 
   # snap -ac (Use if system dump data is needed)

5.4) Rename the test case to include your case number to ensure it is properly attached to your case

  # mv /tmp/ibmsupt/snap.pax.Z  /tmp/ibmsupt/yourcase#[.optional_description].snap.pax.Z

5.5) Capture iptrace by command. 

# startsrc -s iptrace -a "/tmp/ibm/testcase/iptrace.trc"   <----- To start

<<PROBLEM OCCURANCE>>
 
# stopsrc -s iptrace                                       <----- To stop

5.6) Upload the file by one of the following options (a, b, or c)

     a) Attach to your case
     https://www.ibm.com/mysupport/s/my-cases

     b) Upload to the Enhanced Customer Data Repository(ECuRep) 
     https://www.secure.ecurep.ibm.com/app/upload_sf

     c) Upload to the Blue Diamond FTP server (Blue Diamond Customers Only)
     https://msciportal.im-ies.ibm.com

* Note: For information about blue diamond upload see:

     http://www.ibm.com/support/docview.wss?uid=nas8N1020947

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
31 May 2019

UID

ibm10884224