Troubleshooting manual recovery for storage subsystems
PowerHA® SystemMirror® Enterprise Edition Version 7.1.2, or later, supports various storage subsystems that provide high availability for applications and services by monitoring for failures and implementing an automatic recovery for the failure. The storage subsystems use various replication technologies to manage the replication of data between a primary and auxiliary data center.
If the storage subsystem is online and available, PowerHA SystemMirror Enterprise Edition 7.1.2, or later, can
automatically manage the replicated data during the fallover and fallback. However, the following
scenarios explain in what circumstances PowerHA SystemMirror Enterprise Edition does not automatically
manage the replicated data and when manual intervention is required:
- PowerHA SystemMirror Enterprise Edition cannot determine
the status of the storage subsystem, storage links, or device groups. In this scenario, PowerHA SystemMirror Enterprise Edition stops the cluster event
processing and displays the corrective actions in the /var/hacmp/log/hacmp.out
log file. To troubleshoot storage subsystem problems, review the information in the
RECOMMENDED USER ACTIONS
section in the /var/hacmp/log/hacmp.out log file.When the storage subsystem is brought back online, you must manually resume cluster event processing by selecting
from the SMIT interface. - A fallover occurs for a partitioned cluster across different sites. The primary and auxiliary
partitions begin to write data to a local storage subsystem. When the primary partition recovers and
the storage links are brought back online, you must determine whether the data from the two sites
can be merged or if one site's data can replace the other site’s data. In this scenario, you do not
want PowerHA SystemMirror Enterprise Edition to use the
automatic recovery function. To configure PowerHA SystemMirror Enterprise Edition to use manual recovery, complete the following steps:
- From the command line, enter
smit sysmirror
. - In the SMIT interface, select .
- Select the storage subsystem that you want to configure for manual recovery.
- From the Recovery Action field, select MANUAL.
- From the command line, enter
- If an outage affects all the mirror links between the source site and target
site, IBM FlashSystem® A9000 or IBM®
XIV® Storage System on the primary storage might not fail over
to the secondary storage. In this scenario, the mirror consistency group relationship is still
active, but the mirror_switch_roles command fails. If you want the mirror
consistency group to fail over to the secondary storage, you must manually perform the following
steps:
- Deactivate the mirror consistency group relationship on the primary storage by running the
following command:
mirror_deactivate -y cg=cgname
- On the secondary storage, change the role of the consistency group to
Primary
by running the following command:mirror_change_role -y cg=cgname role=Master
- On the primary storage, change the role of the consistency group to
Slave
by running the following command:mirror_change_role -y cg=cgname role=Slave
Note: When you change the role of the consistency group on the secondary storage, the volume group on the secondary storage can be in theVaried ON
state as the mirror volumes on the secondary storage are no longer in read only mode. When you run the mirror_change_role command on the primary storage, a time delay occurs because the I/O activity is broken on the host. To avoid the time delay, stop the disk I/O activity on the host before you run the mirror_change_role command.
- Deactivate the mirror consistency group relationship on the primary storage by running the
following command:
Important: The input and output syntax of command line interface
(CLI) commands uses the legacy terminology of "Master", "SMaster", and "Slave" volumes, which in any
documentation except the CLI reference, are referred to as "Primary", "Secondary", and "Tertiary".
This inconsistency is a necessary compromise, required to avoid changes to older CLI commands that
are in customer use, and also to keep the CLI terminology consistent across the board. The new
terminology helps emphasize the commonality between the more recent functions of Multi-site HA/DR,
high availability (HyperSwap), and the disaster recovery (Synchronous and Asynchronous mirroring)
ones. It is used outside the CLI reference, where broader concepts can be explained.