IBM Support

IJ44826: VNIC FAILOVER FAILURE DUE TO EXTENDED COMPLETE NETWORK OUTAGE APPLIES TO AIX 7300-02

 

APAR status

  • Closed as program error.

Error description

  • When a vNIC client has one or more vNIC server devices
    configured for failover and the network connection for
    the initial vNIC server device goes down at the same time
    that the network connection(s) of all the failover vNIC
    server device(s) are also down for longer than
    approximately 25 seconds, when a vNIC server device does
    become available again, the vNIC connection will not be
    attempted again.
    
    When this happens, the following errors are reported in
    the vNIC client error log; multiple "VNIC_ERR_LOGIN"
    error labels interspersed with multiple "VNIC_ERR_CRQ"
    error labels containing the text, "Event CRQ TEVENT
    received:  PARTNER_FAILOVER", possibly followed by one
    "VNIC_ERR" error label containing the text, "Event: CRQ
    reboot failure.  Reboot count: 11, max: 10".
    
    On the vNIC server, "vnicstat -b -d vnicserverX" output
    may include the following text:
    
    "ioctl(VS_IOC_GET_VNDD_STATS) failed, rc = -1
    
    /usr/sbin/entstat.mlxcent: 0909-004 Unable to get
    statistics on device entX, errno = 22
    
    Backing device statistics command failed: Error 0"
    
    In the vNIC server error log(s), backing devices show
    "MLXCENT_LLINK_DOWN" when the network goes down, but
    there is never a "MLXCENT_LLINK_UP".
    

Local fix

  • Once at least one of the backing VF devices returns to
    the LINK UP state, one of the following two options will
    prompt the vNIC client network connection to become
    available:
    
     - Reboot the vNIC client system.
     - Detach the interface associated with the vNIC client
       device (ifconfig enX down detach); run rmdev to put
       the vNIC client device in definded state without
       deleting it (rmdev -l entX); mkdev the vNIC client
       device (mkdev -l entX); re-attach the interface
       (ifconfig enX up).
    

Problem summary

  • When a vNIC client has one or more vNIC server devices
    configured for failover and the network connection for
    the initial vNIC server device goes down at the same time
    that the network connection(s) of all the failover vNIC
    server device(s) are also down for longer than
    approximately 25 seconds, when a vNIC server device does
    become available again, the vNIC connection will not be
    attempted again.
    
    When this happens, the following errors are reported in
    the vNIC client error log; multiple "VNIC_ERR_LOGIN"
    error labels interspersed with multiple "VNIC_ERR_CRQ"
    error labels containing the text, "Event CRQ TEVENT
    received:  PARTNER_FAILOVER", possibly followed by one
    "VNIC_ERR" error label containing the text, "Event: CRQ
    reboot failure.  Reboot count: 11, max: 10".
    
    On the vNIC server, "vnicstat -b -d vnicserverX" output
    may include the following text:
    
    "ioctl(VS_IOC_GET_VNDD_STATS) failed, rc = -1
    
    /usr/sbin/entstat.mlxcent: 0909-004 Unable to get
    statistics on device entX, errno = 22
    
    Backing device statistics command failed: Error 0"
    
    In the vNIC server error log(s), backing devices show
    "MLXCENT_LLINK_DOWN" when the network goes down, but
    there is never a "MLXCENT_LLINK_UP".
    

Problem conclusion

  • Make modifications to the vNIC client to keep retrying
    to connect to a vNIC server device after a complete
    network outage occurs on all of the backing VF devices.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ44826

  • Reported component name

    AIX V7.3

  • Reported component ID

    5765CD300

  • Reported release

    730

  • Status

    CLOSED PER

  • HIPER

    NoHIPER

  • Submitted date

    2023-01-04

  • Closed date

    2023-01-04

  • Last modified date

    2023-11-13

  • APAR is sysrouted FROM one or more of the following:

    IJ44540

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX V7.3

  • Fixed component ID

    5765CD300

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11T","label":"AIX 7.3 HIPERS- APARs and Fixes"},"Platform":[{"code":"PF053","label":"Power Systems"}],"Version":"730","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
14 November 2023