APAR status
Closed as program error.
Error description
When a vNIC client has one or more vNIC server devices configured for failover and the network connection for the initial vNIC server device goes down at the same time that the network connection(s) of all the failover vNIC server device(s) are also down for longer than approximately 25 seconds, when a vNIC server device does become available again, the vNIC connection will not be attempted again. When this happens, the following errors are reported in the vNIC client error log; multiple "VNIC_ERR_LOGIN" error labels interspersed with multiple "VNIC_ERR_CRQ" error labels containing the text, "Event CRQ TEVENT received: PARTNER_FAILOVER", possibly followed by one "VNIC_ERR" error label containing the text, "Event: CRQ reboot failure. Reboot count: 11, max: 10". On the vNIC server, "vnicstat -b -d vnicserverX" output may include the following text: "ioctl(VS_IOC_GET_VNDD_STATS) failed, rc = -1 /usr/sbin/entstat.mlxcent: 0909-004 Unable to get statistics on device entX, errno = 22 Backing device statistics command failed: Error 0" In the vNIC server error log(s), backing devices show "MLXCENT_LLINK_DOWN" when the network goes down, but there is never a "MLXCENT_LLINK_UP".
Local fix
Once at least one of the backing VF devices returns to the LINK UP state, one of the following two options will prompt the vNIC client network connection to become available: - Reboot the vNIC client system. - Detach the interface associated with the vNIC client device (ifconfig enX down detach); run rmdev to put the vNIC client device in definded state without deleting it (rmdev -l entX); mkdev the vNIC client device (mkdev -l entX); re-attach the interface (ifconfig enX up).
Problem summary
When a vNIC client has one or more vNIC server devices configured for failover and the network connection for the initial vNIC server device goes down at the same time that the network connection(s) of all the failover vNIC server device(s) are also down for longer than approximately 25 seconds, when a vNIC server device does become available again, the vNIC connection will not be attempted again. When this happens, the following errors are reported in the vNIC client error log; multiple "VNIC_ERR_LOGIN" error labels interspersed with multiple "VNIC_ERR_CRQ" error labels containing the text, "Event CRQ TEVENT received: PARTNER_FAILOVER", possibly followed by one "VNIC_ERR" error label containing the text, "Event: CRQ reboot failure. Reboot count: 11, max: 10". On the vNIC server, "vnicstat -b -d vnicserverX" output may include the following text: "ioctl(VS_IOC_GET_VNDD_STATS) failed, rc = -1 /usr/sbin/entstat.mlxcent: 0909-004 Unable to get statistics on device entX, errno = 22 Backing device statistics command failed: Error 0" In the vNIC server error log(s), backing devices show "MLXCENT_LLINK_DOWN" when the network goes down, but there is never a "MLXCENT_LLINK_UP".
Problem conclusion
Make modifications to the vNIC client to keep retrying to connect to a vNIC server device after a complete network outage occurs on all of the backing VF devices.
Temporary fix
Comments
APAR Information
APAR number
IJ44826
Reported component name
AIX V7.3
Reported component ID
5765CD300
Reported release
730
Status
CLOSED PER
HIPER
NoHIPER
Submitted date
2023-01-04
Closed date
2023-01-04
Last modified date
2023-11-13
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
AIX V7.3
Fixed component ID
5765CD300
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11T","label":"AIX 7.3 HIPERS- APARs and Fixes"},"Platform":[{"code":"PF053","label":"Power Systems"}],"Version":"730","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]
Document Information
Modified date:
14 November 2023