IBM Support

IV49276: SYSTEM CRASH AFTER CLUSTER REMOVAL APPLIES TO AIX 6100-08

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • If a cluster is removed and some nodes that members of the
    cluster are unaware of the removal, such as if they are not
    online or are stopped, if the cluster is sufficiently large
    enough (greater than 16 nodes), the unaware node(s) may crash
    with the following stack trace shown:
    
    pvthread+000200 STACK:  WARNING: bad IAR: 00000000, display
    stack from LR: F1000000C044A 9B4
    ■F1000000C044A9B4get_node_state_from_repos+000114
    (000000000000 0000, 000000000001000A ■??)
    ■F1000000C044D4A8gossip_xmt+0006C8 ()
    ■F1000000C044DC8Cinit_gossip_timer+00014C ()
    ■00014D70.hkey_legacy_gate+00004C () ■000F9090clock+000330
    (??) ■002E8CB8i_softmod+0005D8 () ■001B7BD4flih_util+000260
    () ____ Exception (F00000002FF47600) ____ iar   :
    00000000000D00FC  msr   : 8000000000009032  cr    : 2400 8224
    lr    : 0000000000097D84  ctr   : 0000000000000000  xer   :
    0000 0000 mq    : 00000000  asr   : 00000000BE42D001  amr   :
    FFFCFF3FFFFF FFFF r0  : 0000000000000000  r1  :
    0FFFFFFFF3FFFDA0  r2  : 0000000002 CD62D0 r3  :
    0000000000000000  r4  : 0000000000000000  r5  : 800000001C
    949120 r6  : 800000001C9564F8  r7  : 0000000000000000  r8  :
    0000000000 000000 r9  : 0000000000000040  r10 :
    0000000000000000  r11 : 0000000000 1A0F13 r12 :
    0000000000000000  r13 : F1000A00E00B0C00  r14 : 0000000000
    0034E0 r15 : 0000000000000000  r16 : 0000000000000000  r17 :
    0000000000 000000 r18 : 0000000001827F46  r19 :
    000000003B9ACA00  r20 : 0000000000 000001 r21 :
    0000000000000000  r22 : 0000000002D40C24  r23 : 0000000000
    000000 r24 : F1000F0A10000278  r25 : 00000000024D6500  r26 :
    0000000002 4D64FE
    

Local fix

Problem summary

  • If a cluster is removed and some nodes that members of the
    cluster are unaware of the removal, such as if they are not
    online or are stopped, if the cluster is sufficiently large
    enough (greater than 16 nodes), the unaware node(s) may crash
    with the following stack trace shown:
    
    pvthread+000200 STACK:  WARNING: bad IAR: 00000000, display
    stack from LR: F1000000C044A 9B4
     F1000000C044A9B4 get_node_state_from_repos+000114
    (000000000000 0000, 000000000001000A  ?? )
     F1000000C044D4A8 gossip_xmt+0006C8 ()
     F1000000C044DC8C init_gossip_timer+00014C ()
     00014D70 .hkey_legacy_gate+00004C ()  000F9090 clock+000330
    (??)  002E8CB8 i_softmod+0005D8 ()  001B7BD4 flih_util+000260
    () ____ Exception (F00000002FF47600) ____ iar   :
    00000000000D00FC  msr   : 8000000000009032  cr    : 2400 8224
    lr    : 0000000000097D84  ctr   : 0000000000000000  xer   :
    0000 0000 mq    : 00000000  asr   : 00000000BE42D001  amr   :
    FFFCFF3FFFFF FFFF r0  : 0000000000000000  r1  :
    0FFFFFFFF3FFFDA0  r2  : 0000000002 CD62D0 r3  :
    0000000000000000  r4  : 0000000000000000  r5  : 800000001C
    949120 r6  : 800000001C9564F8  r7  : 0000000000000000  r8  :
    0000000000 000000 r9  : 0000000000000040  r10 :
    0000000000000000  r11 : 0000000000 1A0F13 r12 :
    0000000000000000  r13 : F1000A00E00B0C00  r14 : 0000000000
    0034E0 r15 : 0000000000000000  r16 : 0000000000000000  r17 :
    0000000000 000000 r18 : 0000000001827F46  r19 :
    000000003B9ACA00  r20 : 0000000000 000001 r21 :
    0000000000000000  r22 : 0000000002D40C24  r23 : 0000000000
    000000 r24 : F1000F0A10000278  r25 : 00000000024D6500  r26 :
    0000000002 4D64FE
    

Problem conclusion

  • The logic in the cluster kernel extension remove procedure was
    adjusted to properly handle these cases.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV49276

  • Reported component name

    AIX 610 STD EDI

  • Reported component ID

    5765G6200

  • Reported release

    610

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2013-09-17

  • Closed date

    2013-09-17

  • Last modified date

    2014-02-17

  • APAR is sysrouted FROM one or more of the following:

    IV48787

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX 610 STD EDI

  • Fixed component ID

    5765G6200

Applicable component levels

  • R610 PSY U859947

       UP14/02/17 I 1000

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSLLZP","label":"AIX Standard Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSAUMY","label":"IBM AIX Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
17 February 2014