IBM Support

IV74438: RMC UNRESPONSIVE TO GROUP SERVICES MAY CAUSE NODE FAILURE

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • ***************************************************************
    * USERS AFFECTED:
    * Customers using PowerHA SystemMirror v7, or Tivoli System
    * Automation for Multi-Platform, with RSCT 3.1.4.0 or higher,
    * which was shipped with AIX 6.1 TL8 and 7.1 TL2 in 2012, and
    * was also available for download from Fix Central.
    * Customers running AIX 7.1 TL4 or 7.2 are not affected.
    ***************************************************************
    * PROBLEM DESCRIPTION:
    * A potentially long-blocking system call made by an auxiliary
    * thread of the RMC subsystem was contained within a mutex
    * lock which would block the main thread of that process.
    *
    * A significant blockage on that call could prevent timely
    * responses to status checks from the Group Services subsystem,
    * leading to a GS_DAEMON_UNRESP_WA error:
    * "RSCT daemon (rmcd) is not responding. So GSD will exit."
    * This will usually result in the node being brought down via
    * reboot, panic, or other means, depending on the cluster type
    * and protection methods active at the time.
    *
    * For systems where an AIX kernel panic will result (such as
    * PowerHA SystemMirror V7), the panic string will show:
    * "RSCT reboot caused by critical resource protection - Group
    * Services"
    ***************************************************************
    * RECOMMENDATION:
    * IV74438 is officially fixed in rsct.core.utils 3.2.0.10,
    * which is available in AIX 6.1 TL9 SP6, and will also be
    * available in AIX 7.1 TL3 SP7.
    * An interim fix for other SP levels is available from:
    * https://ibm.biz/PowerHAFixes
    * (Ifixes for older levels can be requested from IBM service
    * on an as-needed basis.  This fix went into the base of RSCT
    * 3.2.1.0 for AIX 7.1 TL4 and 7.2 TL0.)
    ***************************************************************
    

Local fix

Problem summary

  • ***************************************************************
    * USERS AFFECTED:
    * Customers using PowerHA SystemMirror v7, or Tivoli System
    * Automation for Multi-Platform, with RSCT 3.1.4.0 or higher,
    * which was shipped with AIX 6.1 TL8 and 7.1 TL2 in 2012, and
    * was also available for independent download from Fix Central.
    * Customers running AIX 7.1 TL4 or 7.2 are not affected.
    ***************************************************************
    * PROBLEM DESCRIPTION:
    * A potentially long-blocking system call made by an auxiliary
    * thread of the RMC subsystem was contained within a mutex
    * lock which would block the main thread of that process.
    *
    * A significant blockage on that call could prevent timely
    * responses to status checks from the Group Services subsystem,
    * leading to a GS_DAEMON_UNRESP_WA error:
    * "RSCT daemon (rmcd) is not responding. So GSD will exit."
    * This will usually result in the node being brought down via
    * reboot, panic, or other means, depending on the cluster type
    * and protection methods active at the time.
    *
    * For systems where an AIX kernel panic will result (such as
    * PowerHA SystemMirror V7), the panic string will show:
    * "RSCT reboot caused by critical resource protection - Group
    * Services"
    ***************************************************************
    * RECOMMENDATION:
    * IV74438 is officially fixed in rsct.core.utils 3.2.0.10,
    * which is available in AIX 6.1 TL9 SP6, and will also be
    * available in AIX 7.1 TL3 SP7.
    * An interim fix for other SP levels is available from:
    * https://ibm.biz/PowerHAFixes
    * (Ifixes for older levels can be requested from IBM service
    * on an as-needed basis.  This fix went into the base of RSCT
    * 3.2.1.0 for AIX 7.1 TL4 and 7.2 TL0.)
    ***************************************************************
    

Problem conclusion

  • The problematic call has been moved outside the mutex lock,
    allowing it to run as long as necessary without blocking
    the main thread.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

  • AIX 6100-09 - use RSCT APAR IV74438
    AIX 7100-03 - use RSCT APAR IV74438
    

APAR Information

  • APAR number

    IV74438

  • Reported component name

    RSCT/RMC FOR CS

  • Reported component ID

    5765F07AP

  • Reported release

    320

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Submitted date

    2015-06-22

  • Closed date

    2015-07-26

  • Last modified date

    2017-08-02

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IV75478

Fix information

  • Fixed component name

    RSCT/RMC FOR CS

  • Fixed component ID

    5765F07AP

Applicable component levels

  • R320 PSY U876547

       UP17/08/02 I 1000 Ž

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11O","label":"APARs - AIX 4.3 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11N","label":"APARs - AIX 5.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11P","label":"APARs - AIX 5.3 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11M","label":"APARs - AIX 5.2 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
02 August 2017