APAR status
Closed as program error.
Error description
When "shutdown -F" is issued on a node, the documented and expected behavior of PowerHA is to do a forced down (that is, unmanage the resource groups) on that node, and do a graceful down on other nodes. The intent was to quickly cease processing on the node being shut down, but also to clean up NFS cross mounts or other replication mechanisms on surviving nodes. However, what actually happens is that the node that is being shut down does a forced down, as expected, and then attempts to release the resource groups held on that node. If the actual stop_server processing takes longer than a few seconds, AIX will kill the event processing in the middle - since AIX allows only a short time for other parts of the system to respond to a "shutdown -F". This can leave applications in odd states, as reflected in the shared storage.
Local fix
Problem summary
when aix is shutdown with PowerHA cluster services active, the expected behavior is that the node being shutdown will run a "node_down forced" event and the survivng nodes will run a "node_down" graceful event. In other words, the node being shutdown will make to attempt to release cluster resources and the survivng nodes will make no attempt to takeover resources. Depending on the timing of the shutdown and the performance of user supplied scripts, there may be a scenario where resources are taken over when they should not be. This may leave the resource group in ERROR state, or there may be an event script error that needs recovery.
Problem conclusion
Solution is to run the node_down event directly on the node being shutdown without involving the remote nodes. This ensures the PowerHA stack is brought down as soon as possible and no attempt is made at takeover.
Temporary fix
Best practice is to stop PowerHA cluster services before shutting down AIX.
Comments
APAR Information
APAR number
IJ07893
Reported component name
POWERHA SYSMIR
Reported component ID
5765H3900
Reported release
721
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Submitted date
2018-07-18
Closed date
2018-11-27
Last modified date
2018-11-27
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
POWERHA SYSMIR
Fixed component ID
5765H3900
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSLM9V","label":"PowerHA SystemMirror Standard Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"721","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSXU4N","label":"PowerHA SystemMirror Enterprise Edition for AIX"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"721","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSLM9V","label":"PowerHA SystemMirror Standard Edition for AIX"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"721","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}},{"Business Unit":{"code":"BU008","label":"Security"},"Product":{"code":"SGL4G4","label":"PowerHA"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"721","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
19 October 2021