Example: Node and site failure

This topic describes how GLVM for PowerHA® SystemMirror® Enterprise Edition handles failures and what happens in the cluster to ensure that the geographically mirrored data remains available to the application.

The following figure with two nodes at each site is used in these scenarios. In this configuration, Node 2 is the home node, or the primary owner of the resource group. It has the RPV client configured and resides on the local site. Node 1 is the second priority node for the resource group:


GLVM for PowerHA SystemMirror Enterprise Edition cluster configuration with Geographically mirrored volume groups

Node failure on the local site

If Node 2 fails, the resource group with the application falls over to Node 1.

At a high level, on Node 1, GLVM for PowerHA SystemMirror Enterprise Edition detects that Node 2, the primary owner of the resource group, has failed and moves the resource group to Node 1. It also activates the geographically mirrored volume group on Node1 and ensures that the RPV client on Node 1 is available for communication with the RPV server on Node 3.

The application is kept highly available and the data continues to be geographically mirrored. The application continues to send I/O requests to the RPV, and the GLVM function ensures that the RPV client and RPV server continue to communicate.

If Node 2 rejoins the cluster, based on the resource group policy PowerHA SystemMirror performs the resource group fallback. The resource group moves back to Node 2 (depending on the resource group fallback policy) and the RPV client becomes available on Node 2. This enables reestablishing communication between the RPV client and server. The resource group resumes mirroring data through the RPV.

Node failure on the remote site

If Node 3 that has the RPV server configured, fails, GLVM for PowerHA SystemMirror Enterprise Edition moves the backup instance of the resource group to Node 4 on the remote site. PowerHA SystemMirror also ensures that the RPV server is now available on Node4.

The application is kept highly available and the data continues to be geographically mirrored. The application continues to send I/O requests to the RPV, and the GLVM function ensures that the RPV client and RPV server continue to communicate.

When the node at the remote site rejoins the cluster, based on the resource group policy PowerHA SystemMirror performs the resource group fallback.

Local site failure

If, as a result of a failure, there is no node available on the local site to host a resource group with the geographically mirrored volume group and the application, the resource group falls over to one of the nodes on the remote site.

For example, the resource group falls over from Node 2 on the local site to Node 3 on the remote site. In this case, GLVM for PowerHA SystemMirror Enterprise Edition forcefully activates (if configured to do so) the local volume group with the mirror copy of the data on Node 3, and Node 3 acquires the resource group. The application in the resource group accesses that data directly and does not use an RPV.

If Node 2 rejoins the cluster, PowerHA SystemMirror performs the resource group fallback (depending on the resource group fallback policy). The resource group moves back to Node 2 and the RPV client becomes available on Node 2. The communication between the RPV client and server is reestablished. GLVM for PowerHA SystemMirror Enterprise Edition synchronizes the mirror copies and reestablishes data mirroring. The resource group resumes mirroring data through the RPV.

Remote site failure

The following figure shows a remote site failure (nodes at each site are not shown):
Remote site failure
Note: The previous figure shows the cluster networks.

If, as a result of a failure, there is no node available on the remote site to host the RPV server for the resource group with the geographically mirrored volume group, PowerHA SystemMirror stops the RPV client from trying to communicate with the RPV server.

If a node that hosted the RPV server on the remote site rejoins the cluster, GLVM for PowerHA SystemMirror Enterprise Edition resumes the RPV server on that node and reestablishes the communication between the RPV client and server. Mirroring is reestablished too, and the mirror copies are synchronized. The resource group remains on Node 2 and the RPV client begins to communicate with the RPV server on Node 3. The resource group resumes mirroring data through the RPV.

Preventing cluster partitioning

To prevent cluster partitioning, configure a network for heartbeating between the sites, in addition to several (up to four) IP-based networks over which the mirroring data is being transferred. If all mirroring network connections between the RPV client and server (and between the sites) fail, the heartbeating network prevents cluster partitioning and the resultant data divergence.

The following figure illustrates the failure of the data mirroring network in the cluster with two sites (nodes at each site are not shown):


Failure of the Data Mirroring Network