Repository disk failure

You must plan correctly for repository disk failure and what is needed to correct issues related to repository disk failure.

Repository disk failure is tolerated by a PowerHA® SystemMirror® cluster. If any node in the cluster encounters errors with the repository disk or with accessing the repository disk, the cluster enters a limited or restricted mode. In this mode of operation, you cannot use most topology-related operations. For example, a node cannot be added or a node cannot join the cluster. However, critical cluster functions can be performed. For example, you can move a resource group from an active node to a standby node.

When the repository disk fails, the administrator is notified about the disk failure. PowerHA SystemMirror continues to notify the administrator about the repository disk failure until it is resolved. To get notifications or customize event processing when a repository disk fails, see the Configuring pre-event and post-event processing topic.

PowerHA SystemMirror and Cluster Aware AIX® (CAA) support live repository disk replacement, which you can use to replace a failed or working repository disk. CAA repopulates the new disk with cluster information and starts to use the disk as the repository disk.

PowerHA SystemMirror 7.2.0, or later, supports Automatic Repository Disk Replacement (ARR) function. ARR uses CAA and automatically replaces a failed repository disk with a backup repository disk. The ARR function is available only if you configure a backup repository disk with PowerHA SystemMirror.

To use the ARR function, your environment must meet the following requirements:
  • A cluster or site has a backup repository disk identified.
  • PowerHA SystemMirror Version 7.2.0, or later, is installed.
  • One of the following versions of the AIX operating system is installed:
    • AIX Version 7.1.4, or later
    • AIX Version 7.2.0, or later

CAA monitors repository disk failure by checking I/O errors and by verifying that the disk is in an active state. These verification checks occur periodically and are not performed every time the repository disk is read from or written to. Do not write directly to the repository disk, even for testing purposes. Writing directly to the repository disk asynchronously might cause the operating system and CAA operations to be disrupted abruptly, resulting in unpredictable results.