APAR status
Closed as program error.
Error description
Customer may experience that a VIOServer reboots on its own after upgrading to version 2.2.4.10. If the dump is analyzed in kdb, it will show: (0)> f pvthread+00B100 STACK: [047FCDF8]entcore_tx_send+0018D8 (00C78D0100000000, 0000000000000031, 0000000000000000, 0000000000000000, 000000000000002E [??]) [047FB000]entcore_tx_enqueue+000280 (F1000A0029CD0000, F1000A0029CD2D88, F1000E0007011800, 0000000000000001, F1000E0007011800) [04801AF4]entcore_output+000594 (??, ??) [047B048C]mlxent_output+00000C (??, ??) [F1000000C062DB1C]ethchan_output+0002FC (??, ??) [F1000000C065CFD0]sea_output+000550 (??, ??) [F1000000C04CF878]send_packet+000238 (??, ??) [F1000000C04CFEA8]vlan_output+000108 (??, ??) [00014D70].hkey_legacy_gate+00004C () [056231E8]en_output+000928 (??, ??, ??, ??) [052B1C60]ip_output_post_fw+0018E0 (0000000000000000, F1000E0007011800, F1000A0029DEB710) [052B3C80]ip_output+000140 (??, ??, ??, ??, ??, ??) [05313F30]tcp_resprst+0005F0 (??, ??, ??, ??, ??, ??) [052FBA1C]tcp_input0+00213C (??, ??, ??, ??) [05304140]tcp_input+0001A0 (F1000E0007011800, 0000001400000014) [0524CC94]ipintr_noqueue_post_fw+000994 (F1000A023AA33E20, F1000E0007011800, F1000A0029DEBF30) [0524E040]ipintr_noqueue+0001E0 (??, ??, ??) [0524F6DC]in_newstack+000020 () [kdb_get_virtual_memory] no real storage @ FFFFFFFF40174B0 [kdb_read_mem] no real storage @ FFFFFFFFFFF8DC0 (0)> dc @iar 047FCDF8 lhz r3,0(r3) (0)> status CPU INTR TID TSLOT PID PSLOT PROC_NAME 0 B10063 177 24006A 36 seaproc Cards that will see the problem include: EC27 PCIe2 LP 2-Port 10GbE RoCE SFP+ Adapter EC28 PCIe2 2-Port 10 GbE RoCE SFP+ adapter EC37 (Copper) PCIe3 2-port 10 GbE NIC and RoCE SFP+ Copper EC38 (Copper) PCIe3 LP 2-port 10 GbE NIC and RoCE SFP+ Copper EL3X (Copper) PCIe3 LP 2-Port 10 GbE NIC and RoCE SFP+ Copper EC2M (Fiber) PCIe3 2-port 10 GbE NIC and RoCE SR EC2N (Fiber) PCIe3 2-port 10 GbE NIC and RoCE SR EL40 (Fiber) PCIe3 LP 2-port 10 GbE NIC and RoCE SR
Local fix
The only workaround is to go back to an earlier VIOS level.
Problem summary
When customer upgrades his VIOS level to 2.2.4.10, they are facing system crash (VIOS crash) in entcore_tx_send with following stack. (0)> f pvthread+00B100 STACK: 047FCDF8 entcore_tx_send+0018D8 (00C78D0100000000, 0000000000000000, 0000000000000000, 000000000000002E ?? ) 047FB000 entcore_tx_enqueue+000280 (F1000A0029CD0000, F1000E0007011800, 0000000000000001, F1000E0007011800) 04801AF4 entcore_output+000594 (??, ??) 047B048C mlxent_output+00000C (??, ??) F1000000C062DB1C ethchan_output+0002FC (??, ??) F1000000C065CFD0 sea_output+000550 (??, ??) F1000000C04CF878 send_packet+000238 (??, ??) F1000000C04CFEA8 vlan_output+000108 (??, ??) 00014D70 .hkey_legacy_gate+00004C () 056231E8 en_output+000928 (??, ??, ??, ??) 052B1C60 ip_output_post_fw+0018E0 (0000000000000000, F1000A0029DEB710) 052B3C80 ip_output+000140 (??, ??, ??, ??, ??, ??) 05313F30 tcp_resprst+0005F0 (??, ??, ??, ??, ??, ??) 052FBA1C tcp_input0+00213C (??, ??, ??, ??) 05304140 tcp_input+0001A0 (F1000E0007011800, 0000001400000014) 0524CC94 ipintr_noqueue_post_fw+000994 (F1000A023AA33E20, F1000A0029DEBF30) 0524E040 ipintr_noqueue+0001E0 (??, ??, ??) 0524F6DC in_newstack+000020 () kdb_get_virtual_memory no real storage @ FFFFFFFF40174B0 kdb_read_mem no real storage @ FFFFFFFFFFF8DC0
Problem conclusion
Code is modified not to set M_LARGESEND flag on the receive path. In this case even if same memory is used to send data back to the sender it will not cause any system crash.
Temporary fix
Comments
APAR Information
APAR number
IV87017
Reported component name
AIX V7.1
Reported component ID
5765H4000
Reported release
710
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2016-07-18
Closed date
2016-07-18
Last modified date
2017-10-30
APAR is sysrouted FROM one or more of the following:
IV82694
APAR is sysrouted TO one or more of the following:
U878648
Fix information
Fixed component name
AIX V7.1
Fixed component ID
5765H4000
Applicable component levels
R710 PSY U878648
UP17/10/30 I 1000
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11R"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]
Document Information
Modified date:
18 April 2022