A fix is available
APAR status
Closed as program error.
Error description
*************************************************************** * USERS AFFECTED: * Systems running rsct.core.rmc 3.2.0.0 through 3.2.0.4. * This includes AIX 6.1 TL9 and AIX 7.1 TL3, and VIOS 2.2.3 * Other AIX levels can be affected if RSCT has been updated * independently of AIX. *************************************************************** * PROBLEM DESCRIPTION: * Starting in rsct.core.rmc 3.1.5.0 (and continuing into the * 3.2.0.0 release), a memory leak in CAA-specific code paths * of the IBM.ConfigRM subsystem may lead to library calls * failing which can cause ConfigRM to believe the CAA domain * is being shut down, causing it to go through offline * processing of the RSCT domain, including stopping cthags. * That action is a critical infrastructure loss for PowerHA 7 * or VIOS SSP, and will lead to node failure (halt in the * case of PowerHA, or system crash with VIOS SSP). * * The leak occurs as long as CAA is active, regardless of * what PowerHA or SSP is doing, and only on the node * operating as the ConfigRM Group leader. The GL node * can be identified in "lssrc -ls IBM.ConfigRM" * A reboot is guaranteed to reset the situation. Time to * failure after a new boot is estimated to be between 4 and 8 * months, although no existing records of failures in the * field still retained the time of the last reboot, so a * precise deadline is not known. *************************************************************** * RECOMMENDATION: * The fix for RSCT 3.2.0 is available via RSCT APAR IV69760. * The fix for RSCT 3.2.0 will also ship with: * AIX 6.1 TL9 SP5, AIX 7.1 TL3 SP5, and VIOS 2.2.3.5. * An interim fix for RSCT 3.2.0 is available from either: * ftp://aix.software.ibm.com/aix/ifixes/iv69760/ * https://aix.software.ibm.com/aix/ifixes/iv69760/ *************************************************************** * NOTICE: * The interim fix package available in the links above is a * bundle including all of these fixes: * IV69760 - ConfigRM memory leak (RSCT 3.2 APAR) * IV69674 - TieBreaker issue causing node reboots * IV71572 - On PowerHA 7, "shutdown -F" may end in panic * * It supersedes these previous fix packages: * Label Package Addresses * ---------- ------------------------- ------------------ * IV66606.3 IV66606.3.150225.epkg.Z Only IV69760 * IV66606.3a IV66606.3a.150306.epkg.Z IV69760 & IV71572 * * Note: The official APAR for the ConfigRM memory leak in * RSCT 3.2 is IV69760; however, the early fix packages * for RSCT 3.2 still used "IV66606" as a reference * (the APAR for RSCT 3.1), because the 3.2 APAR had * not yet been cloned at that time. * * If any of those fix packages are already installed and the * IBM.ConfigRM subsystem is active (lssrc), then no further * action is needed, unless the customer wishes to obtain any * of the additional fixes above. * As far as the memory leak itself goes, those older fixes * are fine as long as IBM.ConfigRM is able to run. * * If you are holding one of those packages but have not * yet installed it, you should discard it for the one * available in the links above. * * Any customer who finds ConfigRM is not able to run with * their current fix package should contact IBM support for * assistance on replacing it, since the absence of * IBM.ConfigRM may cause emgr removal checks to fail. ***************************************************************
Local fix
Problem summary
Starting in rsct.core.rmc 3.1.5.0, a slow memory leak in IBM.ConfigRM under CAA can lead to a cluster service shutdown, which causes to a node failure in both PowerHA v7 (halt) and VIOS SSP (system panic). The leak occurs as long as CAA is active, regardless of what PowerHA or SSP is doing, and only on the node operating as the ConfigRM Group leader. The GL node can be identified in "lssrc -ls IBM.ConfigRM" A reboot is guaranteed to reset the situation. Time to failure after a new boot is estimated to be between 6 and 8 months, although no existing records of failures in the field still retained the time of the last reboot, so a precise deadline is not known.
Problem conclusion
The leak has been addressed.
Temporary fix
Comments
APAR Information
APAR number
IV69760
Reported component name
RSCT/RMC FOR CS
Reported component ID
5765F07AP
Reported release
320
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Submitted date
2015-02-21
Closed date
2015-02-21
Last modified date
2017-08-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
RSCT/RMC FOR CS
Fixed component ID
5765F07AP
Applicable component levels
R320 PSY U876547
UP17/08/02 I 1000
PTF to Fileset Mapping
U870804 rsct.core.rmc 3.2.1.0
U873165 rsct.core.rmc 3.2.0.11
U869516 rsct.core.rmc 3.2.0.6
U869645 rsct.core.rmc 3.2.0.7
U870102 rsct.core.rmc 3.2.0.8
U870185 rsct.core.rmc 3.2.0.9
U869140 rsct.core.rmc 3.2.0.5
U876547 rsct.core.rmc 3.2.0.12
U873165 rsct.core.rmc 3.2.0.11
U869516 rsct.core.rmc 3.2.0.6
U869645 rsct.core.rmc 3.2.0.7
U870102 rsct.core.rmc 3.2.0.8
U870185 rsct.core.rmc 3.2.0.9
U869140 rsct.core.rmc 3.2.0.5
U869140 rsct.core.rmc 3.2.0.5
U869516 rsct.core.rmc 3.2.0.6
U869645 rsct.core.rmc 3.2.0.7
U870102 rsct.core.rmc 3.2.0.8
U870185 rsct.core.rmc 3.2.0.9
U870804 rsct.core.rmc 3.2.1.0
U873165 rsct.core.rmc 3.2.0.11
U876547 rsct.core.rmc 3.2.0.12
U876547 rsct.core.rmc 3.2.0.12
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11O","label":"APARs - AIX 4.3 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11N","label":"APARs - AIX 5.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11P","label":"APARs - AIX 5.3 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11M","label":"APARs - AIX 5.2 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"320","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
02 August 2017