Configuring an active-active IBM Storage Scale deployment
You can deploy a IBM Storage Scale Primary instance and attach a deployed IBM Storage Scale Mirror instance and a IBM Storage Scale Tiebreaker instance to create a highly available active /active IBM Storage Scale configuration.
About this task
failureDetectionTime
and leaseRecoveryWait
parameters, and if one
of the primary, secondary server, cluster manager, or the file system manager are on the side that
goes down. The timeout can be up to failureDetectionTime + leaseRecoveryWait
. To
find out the values for these parameters, run this command from the command line for any of the IBM
Storage Scale server
nodes:su - gpfsprod -c '/usr/lpp/mmfs/bin/mmlsconfig leaseRecoveryWait'
su - gpfsprod -c '/usr/lpp/mmfs/bin/mmlsconfig failureDetectionTime'
If
you are deploying a IBM Storage Scale server instance that uses IBM Storage Scale Pattern 1.2.15.0
or later, run the Get Cluster Status operation to list the values for these
properties.leaseRecoveryWait
and
failureDetectionTime
is 10 seconds, and the default is 35 seconds. Update these
properties with caution as lower values could result in data corruption when nodes in the cluster
are down. Contact the IBM Storage Scale service team and read the IBM Storage Scale documentation
to understand the implications before you update these values.To avoid this timeout, when one side goes down in a planned failover, before you bring it down, move the primary, secondary server, cluster manager, and the file system manager to the side that remains active. For information about the operations to make these cluster changes, see Failover active-active operations. If you have a IBM Storage Scale server instance that was deployed with IBM Storage Scale Pattern 1.2.7.0 or older, run this command from the command line for any of the IBM Storage Scale server nodes to update the cluster and file system manager:
su - gpfsprod -c 'sudo /usr/lpp/mmfs/bin/mmchmgr -c nodeIP'
to make the nodeIP
the cluster manager.
su - gpfsprod -c 'sudo
/usr/lpp/mmfs/bin/mmchmgr fileSystemName nodeIP'
to make the nodeIP
manager for fileSystemName
.
- Part of an active-active configuration and use IBM Spectrum Scale synchronous replication, with an active Mirror replica, or
- Part of an active-passive replication, and use volume replication, with a IBM Storage Scale Passive side ready for takeover.
If you have three Cloud Pak System Software for Power® instances in different data centers, or in different zones of a single data center, you should deploy these three IBM Storage Scale Server configurations each in a separate system.
If you have two Cloud Pak System Software for Power instances, instead of deploying a IBM Storage Scale Tiebreaker configuration, you can install an external Tiebreaker node on a separate system and attach that separate node to your IBM Storage Scale cluster. For more information about setting up this external Tiebreaker node instead, see the Related links.
If you choose to deploy a IBM Storage Scale Tiebreaker configuration, and if you are limited to two Cloud Pak System Software for Power instances, you can locate the IBM Storage Scale Tiebreaker configuration on the same system with either the Primary or Mirror configuration. For highest availability, assign each configuration to separate cloud groups which do not share compute nodes. With this setup, the Tiebreaker node does not reside on the same compute node as the Primary or Mirror node, and failure of any one compute node cannot bring down two IBM Storage Scale instances at once.
As a further consideration, when locating the IBM Storage Scale Tiebreaker configuration on the same system with either the Primary or Mirror configuration, if one Cloud Pak System Software for Power instance is larger than the other, you might choose to locate the IBM Storage Scale Primary and Tiebreaker configurations on your largest system, and the IBM Storage Scale Mirror configuration on the smaller system.
Procedure
Results
This procedure attaches the Mirror and Tiebreaker configurations to the Primary configuration, creating the active-active environment.
What to do next
If you have not already done so, you can deploy an instance of the IBM® shared service for IBM Storage Scale and configure it to use your new cluster.As an alternative to using the IBM Storage Scale shared service, starting with IBM Storage Scale pattern V1.2.5.0, the clients can directly provide the IBM Storage Scale server connection information at deployment time.
Deploy other workload patterns that contain the IBM Storage Scale Client Policy (or alternatively, the IBM Storage Scale Client script packages). These workloads can then access the volumes made available by your IBM Storage Scale cluster.
When failover situations occur, depending on the nature of the problem you might be able to recover in several ways. Generally, if either the Primary or Mirror configuration fails, you might be able to recover the IBM Storage Scale cluster. If the Primary or Mirror configuration fails and the Tiebreak configuration fails, the IBM Storage Scale cluster ceases to function.
- You can run the Become Single Rack operation on the surviving Primary or Mirror configuration.
- After fixing the problem on the failing configuration, you can run the Become Replicated Cluster operation.
- You can run the Remove Member operation on the surviving instance and point to the failing instance to remove it from the cluster. For example, if the failed instance is a mirror instance, run the Remove Member operation, and then select the Mirror option. If you need to remove the primary configuration, run the Remove Member operation, and then select the Primary option. Wait for the operation to finish successfully. Run the Get Cluster Status operation to ensure that the nodes and disks for the failed instance are no longer part of the cluster.
- If the instance removed in the previous step is still running, delete it so you can reuse the attached volumes.
- Deploy a new instance, with the same type as the one removed in the previous step. If possible, reattach the same volumes that are used by the deleted instance. If these volumes are not available, that is, the volumes are not healthy or your new instance is deployed in a cloud group or rack different than the deleted instance, you can use new volumes.
- Add the new instance to the surviving primary or mirror instance by using the Add Member operation.
- If more than one file systems are available on the surviving instance,
add volumes to all these file systems on the new deployment.Note: When you expand the cluster with new nodes, data on the primary volumes is preserved when you attach a mirror or tiebreak instance to the primary instance. Data is also preserved when you remove a member from an existing cluster, if at least one of the mirror or primary instances are running. There is no outage for existing clients when the cluster is expanded. There might be a temporary outage when a member is removed from the cluster until the operation is finished.
If the cluster member's servers and disks are stopped (perhaps due to maintenance or manual intervention), you can restart the disks and servers in that member by running the Recover Lost Connection operation.