Configuring an active-active IBM Spectrum Scale deployment
You can deploy a IBM Spectrum Scale Primary instance and attach a deployed IBM Spectrum Scale Mirror instance and a IBM Spectrum Scale Tiebreaker instance to create a highly available active-active IBM Spectrum Scale configuration.
About this task
failureDetectionTime and
leaseRecoveryWait parameters, and if one of the primary, secondary server, cluster
manager, or the file system manager are on the side that goes down. The timeout can be up to
failureDetectionTime + leaseRecoveryWait. To find out the values for these
parameters, run this command from the command line for any of the IBM Spectrum Scale server
nodes:su - gpfsprod -c '/usr/lpp/mmfs/bin/mmlsconfig leaseRecoveryWait'
su - gpfsprod -c '/usr/lpp/mmfs/bin/mmlsconfig failureDetectionTime'Run
the Get Cluster Status operation to list the values for these
properties.leaseRecoveryWait and
failureDetectionTime is 10 seconds, and the default is 35 seconds. Update these
properties with caution as lower values could result in data corruption when nodes in the cluster
are down. Contact the IBM Spectrum Scale service team and read the IBM Spectrum Scale documentation
to understand the implications before you update these values.su - gpfsprod -c 'sudo /usr/lpp/mmfs/bin/mmchmgr -c nodeIP' to make the
nodeIP the cluster manager.
su - gpfsprod -c 'sudo /usr/lpp/mmfs/bin/mmchmgr fileSystemName nodeIP' to make
the nodeIP manager for fileSystemName.
- Part of an active-active configuration and use IBM Spectrum Scale synchronous replication, with an active Mirror replica, or
- Part of an active-passive replication, and use volume replication, with a IBM Spectrum Scale Passive side ready for takeover.
The following procedure shows the general steps you can take to configure an active-active IBM Spectrum Scale deployment for high availability. In this configuration, a IBM Spectrum Scale Primary instance is deployed to one rack (which is referred to as the Primary rack), while a IBM Spectrum Scale Mirror instance and IBM Spectrum Scale Tiebreaker instance are each deployed on separate racks (referred to as the Mirror rack and Tiebreaker rack), preferably at separate locations. Generally, this type of configuration is supported only for geographic distances less than 300 km apart to avoid latency problems with data transfer.
If you have three systems with IBM® Cloud Pak products installed (either Cloud Pak Software or Cloud Pak System Software) in different data centers, or in different zones of a single data center, you should deploy these three IBM Spectrum Scale Server configurations each in a separate system.
If you have two systems with IBM Cloud Pak products installed, instead of deploying a IBM Spectrum Scale Tiebreaker configuration, you can install an external Tiebreaker node on a separate system and attach that separate node to your IBM Spectrum Scale cluster. For more information about setting up this external Tiebreaker node instead, see the Related links.
If you choose to deploy a IBM Spectrum Scale Tiebreaker configuration, and if you are limited to two systems with IBM Cloud Pak products installed, you can locate the IBM Spectrum Scale Tiebreaker configuration on the same system with either the Primary or Mirror configuration. For highest availability, assign each configuration to separate cloud groups which do not share compute nodes. With this setup, the Tiebreaker node does not reside on the same compute node as the Primary or Mirror node, and failure of any one compute node cannot bring down two IBM Spectrum Scale instances at once.
As a further consideration, when locating the IBM Spectrum Scale Tiebreaker configuration on the same system with either the Primary or Mirror configuration, if one system with IBM Cloud Pak products installed is larger than the other, you might choose to locate the IBM Spectrum Scale Primary and Tiebreaker configurations on your largest system, and the IBM Spectrum Scale Mirror configuration on the smaller system.
Procedure
Results
This procedure attaches the Mirror and Tiebreaker configurations to the Primary configuration, creating the active-active environment.
What to do next
If you have not already done so, you can deploy an instance of the IBM shared service for IBM Spectrum Scale and configure it to use your new cluster. As an alternative to using the IBM Spectrum Scale shared service, starting with IBM Spectrum Scale pattern V1.2.5.0, the clients can directly provide the IBM Spectrum Scale server connection information at deployment time.
Deploy other workload patterns that contain the IBM Spectrum Scale Client Policy (or alternatively, the IBM Spectrum Scale Client script packages). These workloads can then access the volumes made available by your IBM Spectrum Scale cluster.
When failover situations occur, depending on the nature of the problem you might be able to recover in several ways. Generally, if either the Primary or Mirror configuration fails, you might be able to recover the IBM Spectrum Scale cluster. If the Primary or Mirror configuration fails and the tiebreak configuration fails, the IBM Spectrum Scale cluster ceases to function.
- You can run the Become Single Rack operation on the surviving Primary or Mirror configuration.
- After you fix the problem on the failing configuration, you can run the Become Replicated Cluster operation.
- You can run the Remove Member operation on the surviving instance and point to the failing instance to remove it from the cluster. For example, if the failed instance is a mirror instance, run the Remove Member operation, and then select the Mirror option. If you need to remove the primary configuration, run the Remove Member operation, and then select the Primary option. Wait for the operation to finish successfully. Run the Get Cluster Status operation to ensure that the nodes and disks for the failed instance are no longer part of the cluster.
- If the instance removed in the previous step is still running, delete it so you can reuse the attached volumes.
- Deploy a new instance, with the same type as the one removed in the previous step. If possible, reattach the same volumes that are used by the deleted instance. If these volumes are not available, that is, the volumes are not healthy or your new instance is deployed in a cloud group or rack different than the deleted instance, you can use new volumes.
- Add the new instance to the surviving primary or mirror instance by using the Add Member operation.
- If more than one file systems are available on the surviving instance,
add volumes to all these file systems on the new deployment.Note: When you expand the cluster with new nodes, data on the primary volumes is preserved when you attach a mirror or tiebreak instance to the primary instance. Data is also preserved when you remove a member from an existing cluster, if at least one of the mirror or primary instances are running. There is no outage for existing clients when the cluster is expanded. There might be a temporary outage when a member is removed from the cluster until the operation is finished.
If the cluster member's servers and disks are stopped (due to maintenance or manual intervention), you can restart the disks and servers in that member by running the Recover Lost Connection operation.