Increase DB2 availability
Using the cloud as a reliable tiebreaker device
In this article, learn to use a new cluster tiebreaker type that was introduced with DB2 for Linux, UNIX and Windows (LUW) Version 10.1. DB2 for LUW has provided integrated high availability and failover clustering since DB2 Version 9.5.
You can use the new cloud tiebreaker type for DB2 LUW failover automation in two node clusters, and you can use it to automate any DB2 LUW topology that spans two clustered nodes. Examples of two-node cluster topolgies include DB2 single partition, DB2 replication (HADR), DB2 partitioned (DPF), and DB2 pureScale. This article focuses on using the cloud tiebreaker to automate DB2 LUW HADR failover reliably. We also discuss recovery actions.
Basic cluster concepts
A cluster is a set of loosely coupled cooperating computer nodes. In practice, the coupling between the compute nodes is provided by standard communication channels.
All nodes in a cluster communicate with each other. An internal heartbeat mechanism continually sends messages to all nodes in the cluster to verify the active membership in the cluster. Communication failures can cause a single cluster to divide into multiple subclusters, which are partially or completely unaware of each other (that is, they are unable to communicate with each other). One of the subclusters may be designated as the primary subcluster. A condition for the formation of the primary subcluster is that it has quorum.
Quorum implies a plurality of active nodes within the cluster that are able to communicate among themselves. To have quorum implies that the set of nodes that can communicate are able to continue carrying out operations.
Internally, the software uses two main methods to determine quorum. The first method is applicable to any cluster composed of an odd number of nodes. In such a cluster, a quorum resolution algorithm called Majority Node Set (MNS) is used. MNS ensures that when a network partition event occurs (an event that renders full communication among all cluster members no longer possible), the subcluster with the plurality remains active and any smaller subclusters are terminated. These nodes are automatically shut down by the clustering software.
For clusters composed of an even number of nodes, there is a requirement for an additional object to arbitrate in cases where a network partition event results in two or more clusters of equal size. This additional object 'breaks a tie' and hence is called a tiebreaker. A common example is a two-node cluster. In such a cluster, if the two nodes cannot communicate with each other, where a network partition event occurs preventing communication between the two nodes, the tiebreaker object is consulted by both nodes. The result of that consultation determines which of the two nodes continues as the majority subset and continues to host cluster resources.
- A tiebreaker must be accessible from all nodes in the cluster.
- A tiebreaker is accessed at network partition time by all surviving nodes to determine the winning subcluster.
- If a node is able to access the tiebreaker resource, then that node is considered to be part of the winning subcluster.
- Nodes that are unable to access the tiebreaker are considered part of the losing subcluster and are shut down.
- The ideal tiebreaker is stateful, in the sense that the tiebreaker object allows only one node access to it at any one time.
- For an ideal tiebreaker, acquisition must be a fast and reliable operation.
There are several types of tiebreakers supported with DB2 LUW. The implementation details vary, but all share the common principle that each tiebreaker object is used to break ties in cases where half of the cluster nodes are not operational.
The two most common tiebreaker types used with DB2 LUW are the disk and the network tiebeakers.
With a disk tiebreaker, a physical segment of a disk is shared between the set of machines in the cluster (we implicitly consider the two node cluster case, as it is the most common, with no loss of generality). In cases with a network partition event, each node attempts to acquire a lock against the same region of disk. If both nodes are active, only one node is able to acquire the lock. The node not able to acquire the lock is not in quorum, and the node able to acquire the lock has achieved quorum and is granted quorum and consequently continues as the surviving subcluster.
To summarize, each node has access to a shared disk. Each node attempts to lock the disk. If the lock is denied, the node will not have quorum.
The network tiebreaker, a popular configuration option, is simple to configure and use. It's useful in cases where there are many redundant network paths between the two node clusters. The additional redundant network paths are recommended for network outages. When all communication channels between the two nodes are down, an attempt is made to acquire the network tiebreaker device. As the number of independent networks between the two machines increases, the probability that the complete failure of communication between nodes indicates a true node failure (and not a network partition event) increases.
The network tiebreaker is specified by a pingable IP address, which must be pingable from each and every node in the cluster. A best practice is to use the default gateway router as the network tiebreaker device. In the case of a potential outage or cluster split, each node attempts to ping the defined IP address. If the node is able to ping the IP address, the determination is made that this node is the surviving subcluster. If a node is alive and cannot ping the IP address, then the node does not have quorum.
With the network tiebreaker, the key assumption is that if the first node can communicate with the default gateway and the second node can communicate with the default gateway, then the first node must be able to communicate with the second node. If this is not the case (for example, the network allows each node to ping a common gateway or device, but not each other), then you should not use a network tiebreaker.
Fundamentally, the ping of the tiebreaker IP is not stateful. A ping can succeed but no state remains resident at the location of the IP address itself, thus allowing the possibility that the tiebreaker can be pinged (acquired) by more than one client (or node). And again, to greatly reduce the probability of such a simultaneous dual acquisition event from occurring, it is recommended to have many redundant network paths between nodes.
The cloud tiebreaker, introduced in DB2 LUW v10.1, is relatively new. With this type of tiebreaker, off-site cloud storage is used to retain tiebreaker state and provide many of the advantages of a disk tiebreaker (split brain avoidance guarantee). The cloud tiebreaker also provides the ease of use of the cloud and the virtualization friendliness of the network tiebreaker type.
The cloud tiebreaker is specified by a pair of access keys used to access the cloud storage. The cloud tiebreaker service must be accessible from each node in the cluster.
Note that the cloud tiebreaker type is only supported for the case of a two-node cluster.
In terms of internal implementation detail, the cloud tiebreaker storage service consists of containers and objects contained within these containers. The container namespace is shared by all the users of the storage service, so container names must be unique. After a container has been created, its name cannot be used to create another container until the container has been deleted. Containers have access control lists. The property of container uniqueness within the cloud service is what is leveraged to provide the guarantee that only one node of the two-node cluster can acquire the tiebreaker device. In this event, it prevents any possibility of a split-brain situation from developing. Figure 1 shows an example.
Figure 1. Cloud tiebreaker storage service
The following section discusses how to install and use the cloud tiebreaker type.
Configure basic db2haicu for HADR automation (without any tiebreaker chosen)
Let's create a two-node cluster to automate HADR failover with the DB2 integrated HA mechanism, which is sometimes called db2haicu. (To create a two-node db2haicu automated HADR cluster, consult the white paper DB2 system topology and configuration for automated multi-site HA and DR.)
The example environment in this article uses a DB2 LUW 10.5 FP3-based configuration. Otherwise, the configuration is unchanged from that in the white paper referenced above. To provide reliable cross-site failover, a third site hosting a stateful tiebreaker device was required. In the white paper, two options are presented for a stateful tiebreaker device:
- A third node added to the RSCT cluster (called the arbitrator node)
- A shared disk tiebreaker
In both cases, a third site was required to host the shared device.
Instead, here we add a third option: use of the cloud tiebreaker. The advantage of the cloud tiebreaker over the two tiebreaker types described above is that the cloud tiebreaker is stateful and does not require a third data center or third site to host the tiebreaker device.
Assume that the configuration has followed the white paper up to the completion of the section entitled "Initial configuration – common to both topologies."
The following is a view of the cluster that has been created (via the output of the
command issued as the instance owner):
DB2 HA Status Instance Information: Instance Name = db2inst1 Number Of Domains = 1 Number Of RGs for instance = 2 Domain Information: Domain Name = ce0102 Cluster Version = 22.214.171.124 Cluster State = Online Number of nodes = 2 Node Information: Node Name State --------------------- ------------------- nodeha02 Online nodeha01 Online Resource Group Information: Resource Group Name = db2_db2inst1_db2inst1_SAMPLE-rg Resource Group LockState = Unlocked Resource Group OpState = Online Resource Group Nominal OpState = Online Number of Group Resources = 1 Number of Allowed Nodes = 2 Allowed Nodes ------------- nodeha01 nodeha02 Member Resource Information: Resource Name = db2_db2inst1_db2inst1_SAMPLE-rs Resource State = Online Resource Type = HADR HADR Primary Instance = db2inst1 HADR Secondary Instance = db2inst1 HADR DB Name = SAMPLE HADR Primary Node = nodeha01 HADR Secondary Node = nodeha02 Resource Group Name = db2_db2inst1_nodeha02_0-rg Resource Group LockState = Unlocked Resource Group OpState = Online Resource Group Nominal OpState = Online Number of Group Resources = 1 Number of Allowed Nodes = 1 Allowed Nodes ------------- nodeha02 Member Resource Information: Resource Name = db2_db2inst1_nodeha02_0-rs Resource State = Online Resource Type = DB2 Member DB2 Member Number = 0 Number of Allowed Nodes = 1 Allowed Nodes ------------- nodeha02 Quorum Information: Quorum Name Quorum State ------------------------------------ -------------------- Fail Offline Operator Online
The remaining work is to define the cloud tiebreaker to the cluster.
If you have difficulty getting this state, see Related topics for helpful articles.
Ensure that Perl 5.4 or later is installed at each cluster node, and that the HMAC Perl modules are installed on each of the cluster's nodes. The installation of Perl and its modules are operating system dependant.
For Linux distributions that use the yum package manager, the following commands executed at each node install the required Perl code and modules:
sudo yum install perl sudo yum install perl-Digest-HMAC-1.01-22.el6.noarch
For Linux distributions that use the apt-get framework for package management, the following installs the required Perl code and modules:
sudo apt-get install perl sudo apt-get install perl-Digest-HMAC-1.01-22.el6.noarch
For other Linux distributions (SLES and variants, for example) and other supported non-Linux operating systems without the required pre-built packages, use CPAN to obtain the needed HMAC Perl module using the following command:
Create two AWS S3 accounts
Create (or obtain access to) two different cloud storage accounts. For example, you can sign up for Amazon web services (AWS) Simple Storage Service (S3).
Each node uses a distinct AWS account to access the (shared) cloud storage. Retrieve the two accounts' access and secret keys from the cloud storage service's website.
Place the access key information on each machine, as described next.
Placement of access keys
Each account has associated with it an access key and a secret key. The access and secret keys must be placed in files accessible to root only on each of the two machines.
The following example shows the naming format of the files. The contents of the files follow naturally and directly from the naming.
/var/ct/cfg/<node1>.access /var/ct/cfg/<node1>.secret /var/ct/cfg/<node2>.access /var/ct/cfg/<node2>.secret
In this sample two-node cluster, the files are named as follows:
/var/ct/cfg/nodeha01.access /var/ct/cfg/nodeha01.secret /var/ct/cfg/nodeha02.access /var/ct/cfg/nodeha02.secret
Ensure that all four files are present at each of the two nodes in the cluster and are root readable.
With Perl and the access keys installed on each node, you can validate the cloud tiebreaker configuration. As root, run the following command on the first node:
Any errors indicate that prerequisites are missing. Correct any errors and retry the above validation step. Do not continue until the validation proceeds without error.
After the previous command runs without error, run the same command on the other node. Errors indicate missing or invalid prerequisites or environment configuration. Do not continue until the validation proceeds without error.
Set the cluster tiebreaker to the cloud
We've validated that the cloud tiebreaker resource is correctly configured at each node of this two-node cluster. To run the following sequence of three commands, log on to root on either node of the two-node cluster.
Note: You need to issue this sequence of commands only once at either node of the two-node cluster.
Issue the following:
To create the tiebreaker resource and name the object CloudTB1, issue the following command as root:
mkrsrc IBM.TieBreaker Type=EXEC Name=CloudTB1 DeviceInfo=PATHNAME=/usr/sbin/rsct/bin/samtb_cld
Issue the following command to set the active tiebreaker for the current cluster to be the newly created tiebreaker object named CloudTB1:
chrsrc -c IBM.PeerNode OpQuorumTieBreaker=CloudTB1
After the three commands complete without error, the two-node cluster has a tiebreaker of the cloud type. Validate the result by examining the cluster quorum state with the following command issued by either db2 instance owner:
db2pd –ha | egrep “(Quorum|Cloud)”
In response, you should see output similar to:
Quorum Information: Quorum Name Quorum State CloudTB1 Online
This output indicates that the newly created tiebreaker is the active one in the cluster. The next section runs through some tests to validate the configuration.
Example scenarios and use cases
There are numerous failure tests that you can run to validate the correct operation of the cluster. In addition to the basic test scenario presented next, Cluster Controlled HADR Configuration Setup using the IBM DB2 High Availability Instance Configuration Utility (db2haicu) (section 6 in particular) and DB2 system topology and configuration for automated multi-site HA and DR (Appendix F) have a fairly comprehensive list of tests that you can run.
Scenario: Failure of nodeha01 via a machine reboot
If nodeha01 loses communication because of a reboot, the response is that nodeha02 accesses a cloud tiebreaker and obtain quorum, as in Figure 2. At the database level, the HADR database becomes primary at the node nodeha02. When nodeha01 is restarted, nodeha01 rejoins the two-node cluster and the HADR database starts up as a standby on nodeha01.
Figure 2. nodeha1 failure and automatic tiebreaker acquisition by nodeha02
The following is output of lssam showing the state immediately after the successful HADR failover to node nodeha02:
Failed offline IBM.ResourceGroup:db2_db2inst1_nodeha01_0-rg Control=MemberInProblemState Nominal=Online '- Failed offline IBM.Application:db2_db2inst1_nodeha01_0-rs Control=MemberInProblemState '- Failed offline IBM.Application:db2_db2inst1_nodeha01_0-rs:nodeha01 Node=Offline Online IBM.ResourceGroup:db2_db2inst1_nodeha02_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_nodeha02_0-rs '- Online IBM.Application:db2_db2inst1_nodeha02_0-rs:nodeha02 Online IBM.ResourceGroup:db2_db2inst1_db2inst1_SAMPLE-rg Request=Lock Nominal=Online '- Online IBM.Application:db2_db2inst1_db2inst1_SAMPLE-rs Control=SuspendedPropagated |- Failed offline IBM.Application:db2_db2inst1_db2inst1_SAMPLE-rs:nodeha01 Node=Offline '- Online IBM.Application:db2_db2inst1_db2inst1_SAMPLE-rs:nodeha02 Online IBM.Equivalency:db2_db2inst1_nodeha01_0-rg_group-equ '- Offline IBM.PeerNode:nodeha01:nodeha01 Node=Offline Online IBM.Equivalency:db2_db2inst1_nodeha02_0-rg_group-equ '- Online IBM.PeerNode:nodeha02:nodeha02 Online IBM.Equivalency:db2_db2inst1_db2inst1_SAMPLE-rg_group-equ |- Offline IBM.PeerNode:nodeha01:nodeha01 Node=Offline '- Online IBM.PeerNode:nodeha02:nodeha02
Problem determination and analysis
The cloud tiebreaker logs into the defined native SYSLOG facility entries affixed with the following label:
For example, if the SYSLOG facility is storing data to the file /var/log/messages at this machine, you can see all entries logged by the cloud tiebreaker by issuing the following command:
cat /var/log/messages | grep samtb_cld
Entries of most interest are those that indicate that quorum has been achieved. You should see messages similar to those below in the SYSLOG, specifically in cases where the cloud tiebreaker is able to acquire the quorum device:
Feb 19 15:59:03 nodeha01 samtb_cld: *****INFO: tryReserve: returning 0 Feb 19 15:59:03 nodeha01 samtb_cld: *****INFO: op=reserve rc=0 log=1 Feb 19 15:59:03 nodeha01 samtb_cld: *****INFO: Exiting samtb_cld main code returning 0 Feb 19 15:59:03 nodeha01 ConfigRM: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template ID: 0:::Details File: :::Location: RSCT,PeerDomain.C,126.96.36.199,18346 :::CONFIGRM_HASQUORUM_ST The operational quorum state of the active peer domain has changed to HAS_QUORUM. In this state, cluster resources may be recovered and controlled as needed by management applications.
In this article, you learned how to use the cloud tiebreaker type in an automated, two-node DB2 LUW HADR topology. You can use the cloud tiebreaker to automate any two-node failover for DB2 LUW 10.1 or greater. You can also use it if the nodes are collocated on the same site or are hosted on different sites.
- Search the IBM DB2 10.5 for LUW documentation for "DB2 High Availability Feature" to get details about high availability.
- "DB2 system topology and configuration for automated multi-site HA and DR" (developerWorks, 2010) discusses automation of DB2 HADR across sites.
- "Automated cluster controlled HADR configuration setup using the IBM DB2 high availability instance configuration utility" (developerWorks, 2009) provides comprehensive background.
- "Using DB2 High Availability Disaster Recovery with Tivoli Systems Automation and Reliable Scalable Cluster Technology" (developerWorks, 2010) focuses on db2haicu cluster diagnostics.
- "Licensing distributed DB2 10.5 servers in a high availability (HA) environment" (developerWorks, 2013) explains it all in plain English.
- "DB2 editions: Which distributed edition of DB2 10.5 is right for you?" (developerWorks, 2013) discusses the different editions of DB2 LUW 10.5.
- High Availability Concepts - Do I need a TieBreaker? briefly explores tiebreaking issues.
- High Availability Concepts - What is Quorum? discusses quorum issues.