IBM Support

IV65639: CAA PARTITIONED CLUSTER CAUSED BY WAIT_ON_NODE_BRINGUP ERROR APPLIES TO AIX 6100-09

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • ****************************************************************
    * USERS AFFECTED:
    * Systems running the 6100-09 Technology Level with
    * bos.cluster.rte below the 6.1.9.45 level.
    ****************************************************************
    * PROBLEM DESCRIPTION:
    * A possible split-brain issue was introduced for
    * PowerHA 7.1 customers.  The issue causes the disk
    * heartbeat signals to be delayed by ~10 seconds during
    * cluster startup.  If network heartbeat is also missed
    * during the time window, the HA node that was just
    * rebooted would miss any other Cluster nodes that are
    * already up. This would result in the rebooted node
    * declaring an independent one node Cluster causing the
    * split.  This apar ensures that the repository disk
    * heartbeat is enabled earlier during the node startup and
    * also includes further safeguards for the node cluster
    * join sequence.
    ****************************************************************
    * RECOMMENDATION:
    * Install APAR IV65639.
    * Prior to fix availability, an interim fix is available from
    * either
    * ftp://aix.software.ibm.com/aix/ifixes/iv65639/
    * https://aix.software.ibm.com/aix/ifixes/iv65639/
    ****************************************************************
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:
    * Systems running the 6100-09 Technology Level with
    * bos.cluster.rte below the 6.1.9.45 level.
    ****************************************************************
    * PROBLEM DESCRIPTION:
    * A possible split-brain issue was introduced for
    * PowerHA 7.1 customers.  The issue causes the disk
    * heartbeat signals to be delayed by  10 seconds during
    * cluster startup.  If network heartbeat is also missed
    * during the time window, the HA node that was just
    * rebooted would miss any other Cluster nodes that are
    * already up. This would result in the rebooted node
    * declaring an independent one node Cluster causing the
    * split.  This apar ensures that the repository disk
    * heartbeat is enabled earlier during the node startup and
    * also includes further safeguards for the node cluster
    * join sequence.
    ****************************************************************
    * RECOMMENDATION:
    * Install APAR IV65639.
    * Prior to fix availability, an interim fix is available from
    * either
    * ftp://aix.software.ibm.com/aix/ifixes/iv65639/
    * https://aix.software.ibm.com/aix/ifixes/iv65639/
    ****************************************************************
    

Problem conclusion

  • Ensure that the repository disk heartbeat is enabled earlier
    during the node startup and also include further safeguards for
     the node cluster join sequence.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

  • 6100-08 - use AIX APAR IV65638
    6100-09 - use AIX APAR IV65639
    6100-09 - use AIX APAR IV65639
    6100-09 - use AIX APAR IV65639
    7100-02 - use AIX APAR IV65643
    7100-03 - use AIX APAR IV65472
    7100-03 - use AIX APAR IV65472
    

APAR Information

  • APAR number

    IV65639

  • Reported component name

    AIX 610 STD EDI

  • Reported component ID

    5765G6200

  • Reported release

    610

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Submitted date

    2014-10-07

  • Closed date

    2014-10-28

  • Last modified date

    2015-11-20

  • APAR is sysrouted FROM one or more of the following:

    IV65472

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX 610 STD EDI

  • Fixed component ID

    5765G6200

Applicable component levels

  • R610 PSY U861492

       UP15/11/20 I 1000

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSAUMY","label":"IBM AIX Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
17 December 2021