IBM Support

IV87017: VIOS 2.2.4.10 UPGRADE- CRASH IN ENTCORE_TX_SEND IV74832 APPLIED

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Customer may experience that a VIOServer reboots on its
    own after upgrading to version 2.2.4.10.
    
    If the dump is analyzed in kdb, it will show:
    
    (0)> f
    pvthread+00B100 STACK:
    [047FCDF8]entcore_tx_send+0018D8 (00C78D0100000000,
    0000000000000031, 0000000000000000, 0000000000000000,
    000000000000002E [??])
    [047FB000]entcore_tx_enqueue+000280 (F1000A0029CD0000,
    F1000A0029CD2D88, F1000E0007011800, 0000000000000001,
    F1000E0007011800)
    [04801AF4]entcore_output+000594 (??, ??)
    [047B048C]mlxent_output+00000C (??, ??)
    [F1000000C062DB1C]ethchan_output+0002FC (??, ??)
    [F1000000C065CFD0]sea_output+000550 (??, ??)
    [F1000000C04CF878]send_packet+000238 (??, ??)
    [F1000000C04CFEA8]vlan_output+000108 (??, ??)
    [00014D70].hkey_legacy_gate+00004C ()
    [056231E8]en_output+000928 (??, ??, ??, ??)
    [052B1C60]ip_output_post_fw+0018E0 (0000000000000000,
    F1000E0007011800, F1000A0029DEB710)
    [052B3C80]ip_output+000140 (??, ??, ??, ??, ??, ??)
    [05313F30]tcp_resprst+0005F0 (??, ??, ??, ??, ??, ??)
    [052FBA1C]tcp_input0+00213C (??, ??, ??, ??)
    [05304140]tcp_input+0001A0 (F1000E0007011800,
    0000001400000014)
    [0524CC94]ipintr_noqueue_post_fw+000994
    (F1000A023AA33E20, F1000E0007011800, F1000A0029DEBF30)
    [0524E040]ipintr_noqueue+0001E0 (??, ??, ??)
    [0524F6DC]in_newstack+000020 ()
    [kdb_get_virtual_memory] no real storage @
    FFFFFFFF40174B0
    [kdb_read_mem] no real storage @ FFFFFFFFFFF8DC0
    
    (0)> dc @iar
    047FCDF8              lhz    r3,0(r3)
    
    (0)> status
    CPU INTR      TID  TSLOT     PID  PSLOT  PROC_NAME
      0         B10063    177  24006A     36  seaproc
    
    Cards that will see the problem include:
    EC27 PCIe2 LP 2-Port 10GbE RoCE SFP+ Adapter
    EC28 PCIe2 2-Port 10 GbE RoCE SFP+ adapter
    EC37 (Copper) PCIe3 2-port 10 GbE NIC and RoCE SFP+
    Copper
    EC38 (Copper) PCIe3 LP 2-port 10 GbE NIC and RoCE SFP+
    Copper
    EL3X (Copper) PCIe3 LP 2-Port 10 GbE NIC and RoCE SFP+
    Copper
    EC2M (Fiber) PCIe3 2-port 10 GbE NIC and RoCE SR
    EC2N (Fiber) PCIe3 2-port 10 GbE NIC and RoCE SR
    EL40 (Fiber) PCIe3 LP 2-port 10 GbE NIC and RoCE SR
    

Local fix

  • The only workaround is to go back to an earlier VIOS
    level.
    

Problem summary

  • When customer upgrades his VIOS level to 2.2.4.10, they are
    facing system crash (VIOS crash) in entcore_tx_send with
    following stack.
    (0)> f
    pvthread+00B100 STACK:
     047FCDF8 entcore_tx_send+0018D8 (00C78D0100000000,
      0000000000000000, 0000000000000000, 000000000000002E  ?? )
     047FB000 entcore_tx_enqueue+000280 (F1000A0029CD0000,
      F1000E0007011800, 0000000000000001, F1000E0007011800)
     04801AF4 entcore_output+000594 (??, ??)
     047B048C mlxent_output+00000C (??, ??)
     F1000000C062DB1C ethchan_output+0002FC (??, ??)
     F1000000C065CFD0 sea_output+000550 (??, ??)
     F1000000C04CF878 send_packet+000238 (??, ??)
     F1000000C04CFEA8 vlan_output+000108 (??, ??)
     00014D70 .hkey_legacy_gate+00004C ()
     056231E8 en_output+000928 (??, ??, ??, ??)
     052B1C60 ip_output_post_fw+0018E0 (0000000000000000,
      F1000A0029DEB710)
     052B3C80 ip_output+000140 (??, ??, ??, ??, ??, ??)
     05313F30 tcp_resprst+0005F0 (??, ??, ??, ??, ??, ??)
     052FBA1C tcp_input0+00213C (??, ??, ??, ??)
     05304140 tcp_input+0001A0 (F1000E0007011800, 0000001400000014)
     0524CC94 ipintr_noqueue_post_fw+000994 (F1000A023AA33E20,
      F1000A0029DEBF30)
     0524E040 ipintr_noqueue+0001E0 (??, ??, ??)
     0524F6DC in_newstack+000020 ()
     kdb_get_virtual_memory  no real storage @ FFFFFFFF40174B0
     kdb_read_mem  no real storage @ FFFFFFFFFFF8DC0
    

Problem conclusion

  • Code is modified not to set M_LARGESEND flag on the
    receive path. In this case even if same memory is used to
    send data back to the sender it will not cause any system
    crash.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV87017

  • Reported component name

    AIX V7.1

  • Reported component ID

    5765H4000

  • Reported release

    710

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-07-18

  • Closed date

    2016-07-18

  • Last modified date

    2017-10-30

  • APAR is sysrouted FROM one or more of the following:

    IV82694

  • APAR is sysrouted TO one or more of the following:

    U878648

Fix information

  • Fixed component name

    AIX V7.1

  • Fixed component ID

    5765H4000

Applicable component levels

  • R710 PSY U878648

       UP17/10/30 I 1000

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11R"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
18 April 2022