Increase DB2 availability

Using the cloud as a reliable tiebreaker device

This article demonstrates how to use a cloud provider to create a reliable tie-breaking method to avoid a split brain scenario. The procedure is for two node clusters running IBM® DB2® for Linux®, UNIX® and Windows® (LUW) and the integrated high availability (HA) infrastructure. See how to automate any two-node failover for DB2 LUW 10.1 or higher.

Share:

Stephen Holt (seholt@us.ibm.com), Senior IT Specialist – Data Management, IBM

Stephen HoltStephen Holt is an advanced technical pre-sales specialist in the New York City area. He has over 20 years of experience working with DB2 ranging from development, deployment, and performance tuning, along with assorted customer facing roles. Stephen now concentrates on leveraging Information Management use of cloud infrastructure for his Wall Street customers. When not working on databases, Stephen enjoys playing soccer and spending time with his family.



Steve Raspudic, PureData Provisioning Architect

Steve RaspudicSteven Raspudic is leading the provisioning team for the PureData for Transactions appliance and has played a key technical role in the delivery of numerous DB2 technologies, including the DB2 High Availability infrastructure and the DB2 High Availability and Disaster Recovery (HADR) technology. He is the holder of several patents in the areas of relational databases and database availability.



19 June 2014

Overview

In this article, learn to use a new cluster tiebreaker type that was introduced with DB2 for Linux, UNIX and Windows (LUW) Version 10.1. DB2 for LUW has provided integrated high availability and failover clustering since DB2 Version 9.5.

You can use the new cloud tiebreaker type for DB2 LUW failover automation in two node clusters, and you can use it to automate any DB2 LUW topology that spans two clustered nodes. Examples of two-node cluster topolgies include DB2 single partition, DB2 replication (HADR), DB2 partitioned (DPF), and DB2 pureScale. This article focuses on using the cloud tiebreaker to automate DB2 LUW HADR failover reliably. We also discuss recovery actions.

Try out IBM products before you buy them.


Basic cluster concepts

A cluster is a set of loosely coupled cooperating computer nodes. In practice, the coupling between the compute nodes is provided by standard communication channels.

All nodes in a cluster communicate with each other. An internal heartbeat mechanism continually sends messages to all nodes in the cluster to verify the active membership in the cluster. Communication failures can cause a single cluster to divide into multiple subclusters, which are partially or completely unaware of each other (that is, they are unable to communicate with each other). One of the subclusters may be designated as the primary subcluster. A condition for the formation of the primary subcluster is that it has quorum.


Quorum

Quorum implies a plurality of active nodes within the cluster that are able to communicate among themselves. To have quorum implies that the set of nodes that can communicate are able to continue carrying out operations.

Internally, the software uses two main methods to determine quorum. The first method is applicable to any cluster composed of an odd number of nodes. In such a cluster, a quorum resolution algorithm called Majority Node Set (MNS) is used. MNS ensures that when a network partition event occurs (an event that renders full communication among all cluster members no longer possible), the subcluster with the plurality remains active and any smaller subclusters are terminated. These nodes are automatically shut down by the clustering software.

For clusters composed of an even number of nodes, there is a requirement for an additional object to arbitrate in cases where a network partition event results in two or more clusters of equal size. This additional object 'breaks a tie' and hence is called a tiebreaker. A common example is a two-node cluster. In such a cluster, if the two nodes cannot communicate with each other, where a network partition event occurs preventing communication between the two nodes, the tiebreaker object is consulted by both nodes. The result of that consultation determines which of the two nodes continues as the majority subset and continues to host cluster resources.


Tiebreaker requirements

  • A tiebreaker must be accessible from all nodes in the cluster.
  • A tiebreaker is accessed at network partition time by all surviving nodes to determine the winning subcluster.
  • If a node is able to access the tiebreaker resource, then that node is considered to be part of the winning subcluster.
  • Nodes that are unable to access the tiebreaker are considered part of the losing subcluster and are shut down.
  • The ideal tiebreaker is stateful, in the sense that the tiebreaker object allows only one node access to it at any one time.
  • For an ideal tiebreaker, acquisition must be a fast and reliable operation.

Supported tiebreakers

There are several types of tiebreakers supported with DB2 LUW. The implementation details vary, but all share the common principle that each tiebreaker object is used to break ties in cases where half of the cluster nodes are not operational.

The two most common tiebreaker types used with DB2 LUW are the disk and the network tiebeakers.

Disk tiebreaker

With a disk tiebreaker, a physical segment of a disk is shared between the set of machines in the cluster (we implicitly consider the two node cluster case, as it is the most common, with no loss of generality). In cases with a network partition event, each node attempts to acquire a lock against the same region of disk. If both nodes are active, only one node is able to acquire the lock. The node not able to acquire the lock is not in quorum, and the node able to acquire the lock has achieved quorum and is granted quorum and consequently continues as the surviving subcluster.

To summarize, each node has access to a shared disk. Each node attempts to lock the disk. If the lock is denied, the node will not have quorum.

Network tiebreaker

The network tiebreaker, a popular configuration option, is simple to configure and use. It's useful in cases where there are many redundant network paths between the two node clusters. The additional redundant network paths are recommended for network outages. When all communication channels between the two nodes are down, an attempt is made to acquire the network tiebreaker device. As the number of independent networks between the two machines increases, the probability that the complete failure of communication between nodes indicates a true node failure (and not a network partition event) increases.

The network tiebreaker is specified by a pingable IP address, which must be pingable from each and every node in the cluster. A best practice is to use the default gateway router as the network tiebreaker device. In the case of a potential outage or cluster split, each node attempts to ping the defined IP address. If the node is able to ping the IP address, the determination is made that this node is the surviving subcluster. If a node is alive and cannot ping the IP address, then the node does not have quorum.

With the network tiebreaker, the key assumption is that if the first node can communicate with the default gateway and the second node can communicate with the default gateway, then the first node must be able to communicate with the second node. If this is not the case (for example, the network allows each node to ping a common gateway or device, but not each other), then you should not use a network tiebreaker.

Fundamentally, the ping of the tiebreaker IP is not stateful. A ping can succeed but no state remains resident at the location of the IP address itself, thus allowing the possibility that the tiebreaker can be pinged (acquired) by more than one client (or node). And again, to greatly reduce the probability of such a simultaneous dual acquisition event from occurring, it is recommended to have many redundant network paths between nodes.

Cloud tiebreaker

The cloud tiebreaker, introduced in DB2 LUW v10.1, is relatively new. With this type of tiebreaker, off-site cloud storage is used to retain tiebreaker state and provide many of the advantages of a disk tiebreaker (split brain avoidance guarantee). The cloud tiebreaker also provides the ease of use of the cloud and the virtualization friendliness of the network tiebreaker type.

The cloud tiebreaker is specified by a pair of access keys used to access the cloud storage. The cloud tiebreaker service must be accessible from each node in the cluster.

Note that the cloud tiebreaker type is only supported for the case of a two-node cluster.

In terms of internal implementation detail, the cloud tiebreaker storage service consists of containers and objects contained within these containers. The container namespace is shared by all the users of the storage service, so container names must be unique. After a container has been created, its name cannot be used to create another container until the container has been deleted. Containers have access control lists. The property of container uniqueness within the cloud service is what is leveraged to provide the guarantee that only one node of the two-node cluster can acquire the tiebreaker device. In this event, it prevents any possibility of a split-brain situation from developing. Figure 1 shows an example.

Figure 1. Cloud tiebreaker storage service
Cloud tiebreaker type storage service

The following section discusses how to install and use the cloud tiebreaker type.


Configure basic db2haicu for HADR automation (without any tiebreaker chosen)

Let's create a two-node cluster to automate HADR failover with the DB2 integrated HA mechanism, which is sometimes called db2haicu. (To create a two-node db2haicu automated HADR cluster, consult the white paper DB2 system topology and configuration for automated multi-site HA and DR.)

The example environment in this article uses a DB2 LUW 10.5 FP3-based configuration. Otherwise, the configuration is unchanged from that in the white paper referenced above. To provide reliable cross-site failover, a third site hosting a stateful tiebreaker device was required. In the white paper, two options are presented for a stateful tiebreaker device:

  1. A third node added to the RSCT cluster (called the arbitrator node)
  2. A shared disk tiebreaker

In both cases, a third site was required to host the shared device.

Instead, here we add a third option: use of the cloud tiebreaker. The advantage of the cloud tiebreaker over the two tiebreaker types described above is that the cloud tiebreaker is stateful and does not require a third data center or third site to host the tiebreaker device.

Assume that the configuration has followed the white paper up to the completion of the section entitled "Initial configuration – common to both topologies."

The following is a view of the cluster that has been created (via the output of the db2pd -ha command issued as the instance owner):

           DB2 HA Status 
Instance Information:
Instance Name                  = db2inst1          
Number Of Domains              = 1         
Number Of RGs for instance     = 2         

Domain Information:
Domain Name                    = ce0102                  
Cluster Version                = 3.1.2.2   
Cluster State                  = Online    
Number of nodes                = 2         

Node Information:
Node Name                     State                         
---------------------         -------------------           
nodeha02                        Online                        
nodeha01                        Online                        

Resource Group Information:
Resource Group Name            = db2_db2inst1_db2inst1_SAMPLE-rg
Resource Group LockState       = Unlocked                
Resource Group OpState         = Online                  
Resource Group Nominal OpState = Online                  
Number of Group Resources      = 1         
Number of Allowed Nodes        = 2         
   Allowed Nodes                 
   -------------                 
   nodeha01                        
   nodeha02                        
Member Resource Information:
   Resource Name                  = db2_db2inst1_db2inst1_SAMPLE-rs
   Resource State                 = Online    
   Resource Type                  = HADR      
   HADR Primary Instance          = db2inst1                      
   HADR Secondary Instance        = db2inst1                      
   HADR DB Name                   = SAMPLE                        
   HADR Primary Node              = nodeha01                        
   HADR Secondary Node            = nodeha02                        

Resource Group Name            = db2_db2inst1_nodeha02_0-rg
Resource Group LockState       = Unlocked                
Resource Group OpState         = Online                  
Resource Group Nominal OpState = Online                  
Number of Group Resources      = 1         
Number of Allowed Nodes        = 1         
   Allowed Nodes                 
   -------------                 
   nodeha02                        
Member Resource Information:
   Resource Name                  = db2_db2inst1_nodeha02_0-rs
   Resource State                 = Online    
   Resource Type                  = DB2 Member
   DB2 Member Number              = 0         
   Number of Allowed Nodes        = 1         
      Allowed Nodes                 
      -------------                 
      nodeha02                        

Quorum Information:
Quorum Name                                  Quorum State                       
------------------------------------         --------------------               
Fail                                         Offline                            
Operator                                     Online

The remaining work is to define the cloud tiebreaker to the cluster.

If you have difficulty getting this state, see Resources for helpful articles.

Install prerequisites

Ensure that Perl 5.4 or later is installed at each cluster node, and that the HMAC Perl modules are installed on each of the cluster's nodes. The installation of Perl and its modules are operating system dependant.

For Linux distributions that use the yum package manager, the following commands executed at each node install the required Perl code and modules:

sudo yum install perl
sudo yum install perl-Digest-HMAC-1.01-22.el6.noarch

For Linux distributions that use the apt-get framework for package management, the following installs the required Perl code and modules:

sudo apt-get install perl 
sudo apt-get install perl-Digest-HMAC-1.01-22.el6.noarch

For other Linux distributions (SLES and variants, for example) and other supported non-Linux operating systems without the required pre-built packages, use CPAN to obtain the needed HMAC Perl module using the following command:

cpan Digest::HMAC_SHA1

Create two AWS S3 accounts

Create (or obtain access to) two different cloud storage accounts. For example, you can sign up for Amazon web services (AWS) Simple Storage Service (S3).

Each node uses a distinct AWS account to access the (shared) cloud storage. Retrieve the two accounts' access and secret keys from the cloud storage service's website.

Place the access key information on each machine, as described next.

Placement of access keys

Each account has associated with it an access key and a secret key. The access and secret keys must be placed in files accessible to root only on each of the two machines.

The following example shows the naming format of the files. The contents of the files follow naturally and directly from the naming.

/var/ct/cfg/<node1>.access
/var/ct/cfg/<node1>.secret  
/var/ct/cfg/<node2>.access 
/var/ct/cfg/<node2>.secret

In this sample two-node cluster, the files are named as follows:

/var/ct/cfg/nodeha01.access 
/var/ct/cfg/nodeha01.secret  
/var/ct/cfg/nodeha02.access 
/var/ct/cfg/nodeha02.secret

Ensure that all four files are present at each of the two nodes in the cluster and are root readable.

Environment validation

With Perl and the access keys installed on each node, you can validate the cloud tiebreaker configuration. As root, run the following command on the first node:

/usr/sbin/rsct/bin/samtb_cld

Any errors indicate that prerequisites are missing. Correct any errors and retry the above validation step. Do not continue until the validation proceeds without error.

After the previous command runs without error, run the same command on the other node. Errors indicate missing or invalid prerequisites or environment configuration. Do not continue until the validation proceeds without error.

Set the cluster tiebreaker to the cloud

We've validated that the cloud tiebreaker resource is correctly configured at each node of this two-node cluster. To run the following sequence of three commands, log on to root on either node of the two-node cluster.

Note: You need to issue this sequence of commands only once at either node of the two-node cluster.

Issue the following:

export CT_MANAGEMENT_SCOPE=2

To create the tiebreaker resource and name the object CloudTB1, issue the following command as root:

mkrsrc IBM.TieBreaker Type=EXEC Name=CloudTB1 DeviceInfo=PATHNAME=/usr/sbin/rsct/bin/samtb_cld

Issue the following command to set the active tiebreaker for the current cluster to be the newly created tiebreaker object named CloudTB1:

chrsrc -c IBM.PeerNode OpQuorumTieBreaker=CloudTB1

After the three commands complete without error, the two-node cluster has a tiebreaker of the cloud type. Validate the result by examining the cluster quorum state with the following command issued by either db2 instance owner:

db2pd –ha | egrep “(Quorum|Cloud)”

In response, you should see output similar to:

Quorum Information:
Quorum Name                                  Quorum State                       
CloudTB1                                     Online

This output indicates that the newly created tiebreaker is the active one in the cluster. The next section runs through some tests to validate the configuration.


Example scenarios and use cases

There are numerous failure tests that you can run to validate the correct operation of the cluster. In addition to the basic test scenario presented next, Cluster Controlled HADR Configuration Setup using the IBM DB2 High Availability Instance Configuration Utility (db2haicu) (section 6 in particular) and DB2 system topology and configuration for automated multi-site HA and DR (Appendix F) have a fairly comprehensive list of tests that you can run.

Scenario: Failure of nodeha01 via a machine reboot

If nodeha01 loses communication because of a reboot, the response is that nodeha02 accesses a cloud tiebreaker and obtain quorum, as in Figure 2. At the database level, the HADR database becomes primary at the node nodeha02. When nodeha01 is restarted, nodeha01 rejoins the two-node cluster and the HADR database starts up as a standby on nodeha01.

Figure 2. nodeha1 failure and automatic tiebreaker acquisition by nodeha02
nodeha1 failure and automatic tiebreaker acquisition by nodeha02

The following is output of lssam showing the state immediately after the successful HADR failover to node nodeha02:

Failed offline IBM.ResourceGroup:db2_db2inst1_nodeha01_0-rg Control=MemberInProblemState Nominal=Online
        '- Failed offline IBM.Application:db2_db2inst1_nodeha01_0-rs Control=MemberInProblemState
                '- Failed offline IBM.Application:db2_db2inst1_nodeha01_0-rs:nodeha01 Node=Offline
Online IBM.ResourceGroup:db2_db2inst1_nodeha02_0-rg Nominal=Online
        '- Online IBM.Application:db2_db2inst1_nodeha02_0-rs
                '- Online IBM.Application:db2_db2inst1_nodeha02_0-rs:nodeha02
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_SAMPLE-rg Request=Lock Nominal=Online
        '- Online IBM.Application:db2_db2inst1_db2inst1_SAMPLE-rs Control=SuspendedPropagated
                |- Failed offline IBM.Application:db2_db2inst1_db2inst1_SAMPLE-rs:nodeha01 Node=Offline
                '- Online IBM.Application:db2_db2inst1_db2inst1_SAMPLE-rs:nodeha02
Online IBM.Equivalency:db2_db2inst1_nodeha01_0-rg_group-equ
        '- Offline IBM.PeerNode:nodeha01:nodeha01 Node=Offline
Online IBM.Equivalency:db2_db2inst1_nodeha02_0-rg_group-equ
        '- Online IBM.PeerNode:nodeha02:nodeha02
Online IBM.Equivalency:db2_db2inst1_db2inst1_SAMPLE-rg_group-equ
        |- Offline IBM.PeerNode:nodeha01:nodeha01 Node=Offline
        '- Online IBM.PeerNode:nodeha02:nodeha02

Problem determination and analysis

The cloud tiebreaker logs into the defined native SYSLOG facility entries affixed with the following label:

samtb_cld

For example, if the SYSLOG facility is storing data to the file /var/log/messages at this machine, you can see all entries logged by the cloud tiebreaker by issuing the following command:

cat /var/log/messages  | grep samtb_cld

Entries of most interest are those that indicate that quorum has been achieved. You should see messages similar to those below in the SYSLOG, specifically in cases where the cloud tiebreaker is able to acquire the quorum device:

Click to see code listing

Feb 19 15:59:03 nodeha01 samtb_cld[7203]: 
*****INFO: tryReserve: returning 0
Feb 19 15:59:03 nodeha01 samtb_cld[7203]: 
*****INFO: op=reserve rc=0 log=1
Feb 19 15:59:03 nodeha01 samtb_cld[7203]: 
*****INFO: Exiting samtb_cld main code returning 0
Feb 19 15:59:03 nodeha01 ConfigRM[5642]: (Recorded using
libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,PeerDomain.C,1.99.22.61,18346     :::CONFIGRM_HASQUORUM_ST The operational quorum state
of the active peer domain has changed to HAS_QUORUM.
In this state, cluster resources may be recovered and
controlled as needed by  management applications.

Conclusion

In this article, you learned how to use the cloud tiebreaker type in an automated, two-node DB2 LUW HADR topology. You can use the cloud tiebreaker to automate any two-node failover for DB2 LUW 10.1 or greater. You can also use it if the nodes are collocated on the same site or are hosted on different sites.

Resources

Learn

Get products and technologies

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, or use a product in a cloud environment.

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=974952
ArticleTitle=Increase DB2 availability
publish-date=06192014