Implementing high availability with DB2 9.5
Choose the right solution to protect your data
High availability is a key requirement for critical database applications. IBM DB2 9.5 provides many features to meet this requirement. If you are new to DB2 on distributed platforms, or even if you have been using it for a while, you may find the array of features that deal with availability confusing. When do you use which feature, and what can you hope to accomplish when you do?
The purpose of this article is to summarize these features and guide you in understanding how DB2 technologies can be used to build highly available database systems. In addition, discover the costs and advantages of each solution.
Before we begin, let's define what we mean by the term high availability (HA). HA is the requirement that the data be available for dependent applications whenever required. The goal is to eliminate or minimize downtime. Related to HA is Disaster Recovery (DR), DR is different from HA in that it focuses on protecting data against data loss caused by a catastrophic failure. This article focuses strictly on HA.
Terminology and client/server database architecture
Terminology and client/server database architecture
We'll start out by discussing a few terms and concepts that are important to understand when discussing high availability.
There are three software pieces of a database solution:
- The user application
- The client software
- The database engine
In addition to software, there are other resources that must be working properly to have a working solution:
- Server hardware
- Network connectivity
- Disk storage
- Operating System
You should take all these aspects into consideration when designing a HA solution. Simply making the database engine highly available does not necessarily create a HA solution. HA solution design is not entirely a technical issue but factors such as a solution's cost, skill requirements, and administration demands must be weighed.
Database applications are client/server based. The application depends on the integrity of the database software to produce uniform results. While this may seem fairly obvious, it is important to understand how this is accomplished when choosing and designing your solution.
SQL transactions fall into two basic types, read and write. Read transactions are select statements that require no insert, update, or delete activity; write transactions, on the other hand, have at least one database change to be made. Read transactions require a consistent view of the data -- meaning if two read transactions are submitted at the same time, they should produce uniform result sets if they select the same data ranges. Write transactions require that committed database changes be persisted even in the event of a failure. Business requirements can affect what HA solutions are available or make the most sense. In general, HA solution design is driven by two factors, uptime requirements and transactional consistency. If the business requires greater availability but read consistency is not a premium, then an asynchronous option may be a more cost-effective approach. On the other hand, if transactional consistency is a key requirement, then a more synchronous option is required. If both transactional consistency and availability are required, then this may further narrow the choices.
There are two types of high availability you may choose from: continuous availability and failover availability.
Continuous availability demands that the database engine be available for processing with no perceivable downtime. The downtime in these scenarios is measured as non-existent, fractions of a second, or, at the high end, a few seconds. This type of availability has traditionally only been implemented in the most critical of applications. To achieve this goal, a high degree of redundancy is required.
There are three basic designs for these types of systems -- programmatic, shared, and multiple copy.
The programmatic design is essentially a user programming solution implemented not at the database level but at the user application level. Essentially, the application implements SQL transactions on multiple systems. The failure of a single system will cause no interruption of transaction processing on its partners. Application logic allows the application to continue performing work even in the event of a failure of one of the database systems.
This type of system has maximum flexibility but also requires a great deal of work to implement well. DB2 support for distributed units of work and transaction monitors help when coding these types of environments, but much of the work depends on your application development team. All applications have to be coded to support your design. "Out of the box" applications will not understand your environment. Some other issues you need to address are:
- Keeping the databases systems synchronized
- Use of non-deterministic functions
- Use of non-atomic stored procedures
Non-deterministic functions are functions that may produce different output depending on where they are run. For example, the DB2 special register CURRENT TIMESTAMP produces different output on different systems due to the nature of the system clocks not being synchronized or the impossibility of the statement arriving at the systems at the exact same time.
Non-atomic stored procedures are essentially stored procedures that are not committed or rolled-back as a whole. The transactions inside the stored procedure may produce different results depending on the conditions when they are run.
Table 1. Programmatic availability
|Flexibility||Application development time and cost|
|Lack of support for packaged applications|
The shared design combines the use of redundant systems while sharing critical resources. This shared design works in a way that, if you have two or more systems actively doing work, in the event that one of them fails, the other system can continue working. The members are not fully independent but must share mechanisms or data in order to perform their work.
The most common shared resource is the physical data storage. Figure 1 demonstrates how the data is stored on a central storage system and accessed by three members.
Figure 1. Shared availability
Shared storage is not the only resource that must be shared in order for this architecture to work properly. In order for each system to perform work, they must share some type of locking and clock mechanisms. The locking mechanism allows each system to perform work on the shared data without performing activities that would interfere with the work being performed by another member. For example, if one member is updating a row, it would not be proper for another member to read the row with any other isolation level than uncommitted read. To prevent this event, the members must share the locking mechanics. In order to ensure a consistent view time, a shared clock mechanism must be implemented.
The nature of this architecture is highly complex and requires a high degree of integration of hardware and software. Some of the issues that must be addressed are locking mechanisms, system timer integration, data buffer sharing or non-sharing, and logging mechanisms. How these mechanisms are implemented has a huge impact on performance and availability. If members do not share data buffers, then each member has the potential of buffering the same data. If ownership of data is simply split between members, then you will have greater data buffering but you have the potential of having transaction routing issues. If an update needs to be processed and it is received at a member that does not own that portion of the data it will need to be routed to another member. Naturally, routing of this type will have a performance impact and you may be forced to route the transaction to the correct member by implementing code in your application. If a member fails, availability can be impacted by what the other members must do in order to continue to work. Ideally, nothing would need to be done and work would continue on the remaining members with no impact. However, some shared designs require all transactions to be suspended or halted and caching and locking mechanisms must be remastered. This essentially creates downtime until these tasks can be completed.
DB2 for zOS in a data sharing environment demonstrates the ideal goal for a shared architecture. The foundation for this environment is the z Parallel Sysplex which provides sysplex timers and coupling facilities. The coupling facilities are highly integrated components that allow for the creation of shared resources such as locking facilities and buffer pools. In addition the communication technology between members is highly efficient. A failure of a member does not affect the transaction workload of other members. In addition to greater availability, you also gain greater capacity and workload balancing. Your applications do not need to be changed to effectively use this solution.
DB2 for non-zOS platforms does not support this type of architecture.
Table 2. Shared availability
|Very high degree of availability||High cost|
|Increased scalability||Specialized hardware required|
|No need to change applications||Specialized skill needed to implement|
For more information on shared availability, see Related topics.
A multiple copy design utilizes multiple copies of a database where the workload is distributed across the copies. In the event of the failure of a copy the workload continues to run on the other database copies. The largest issue that must be addressed when using multiple copies of a database is how to keep them synchronized and how to create consistent query results across the copies. Some type of replication solution is the simplest method to create and synchronize these copies. However, due to the inherent asynchronous nature of traditional replication solutions, I have chosen to describe this scenario in the failover section.
Gridscale from xkoto provides another multiple database copy solution with sophisticated synchronization processes as well as load balancing. Figure 3 shows the basic architecture of a Gridscale solution.
Gridscale acts as a transactional tier in a database solution. It can send transactional work to any of the database copies based upon availability, system utilization, or its synchronization state. If a copy becomes unavailable, then work can simply be shifted to other copies. It will shift the workload to other copies based upon their system load.
Write transactions (inserts, updates, and deletes) are sent to all the copies simultaneously. Gridscale tracks the copies that are synchronized as they respond to write transactions. As they respond to the write transactions they become eligible for further read transactions (selects). This mechanism ensures transactional work always produce consistent results.
Gridscale systems can be clustered to prevent it from becoming a failure point. Gridscale implements its own caching and transactional logging that can be used under several conditions. If a read transaction is directed at a copy that fails or does not return in a reasonable amount of time, the transaction can be routed to another copy. This process is entirely transparent to the application. If a copy goes down due to some type of failure or for maintenance reasons, it can be synchronized from Gridscale transactional log, and, once the process is complete, it can start participating in the distributed workload. Adding additional copies for greater scalability is as simple as restoring a backup from one of the copies and allowing the Gridscale to synchronize the new system with the other copies.
Implementation and administration of a DB2 and Gridscale solution is very simple. The Gridscale administration tool can be used to create and synchronize copies. Through the Gridscale tools, you can evaluate the system workload as a whole or at each individual copy. You can add new copies as well as shift work away from and release other copies. This flexibility allows you to perform rolling system maintenance. There is no requirement that all copies run at the same fix or release level. You can take copies offline, perform any software or hardware maintenance required, and then allow Gridscale to synchronize the system and make it available.
Gridscale also solves many of the other issues that occur with distributed copies. For example, where possible, non-deterministic functions are implemented at the Gridscale server. Under most scenarios, Gridscale requires no changes to client applications. At the client side, the DB2 driver is replaced by a Gridscale driver with additional functionality specific for Gridscale. A Gridscale driver exists for Java, DB2 CLI, as well as .NET applications. The application has no knowledge of Gridscale or the transactional tier.
The overall effect of the solution is there is no downtime due to the loss of individual or multiple systems, greater scalability, and lower costs. Lower costs are driven by the ability to use commodity hardware without the need to purchase large more powerful machines or more costly storage systems.
Figure 2. Multiple copy availability
Table 3. Multiple copy availability
|Very high degree of availability||Additional software to implement|
|Increased scalability||Additional hardware administration|
|No need to change applications|
|Moderate cost increase|
|Easy implementation and administration|
For more information about multiple copy availability, see Related topics.
Failover availability is different from continuous availability because, for some period of time, however small, the database engine is not available for transaction processing. The essential elements for this type of solution are:
- Primary and secondary systems
- Failure detection
- Data source movement
Two systems have copies of the database data or access to a single copy, and when a failure is detected, a failover takes place. In the failover process, the data source is moved from the primary to the secondary system.
There are two types of failover availability: synchronous and asynchronous. Synchronous availability guarantees that the data sources on primary and secondary systems are identical, and complete data continuity is maintained after a failover. Asynchronous availability does not guarantee that primary and secondary system databases are completely synchronized. The method of moving database changes from the primary to the secondary system varies, but the process produces a window of time during which data has not migrated from one system to the other. The amount of data may be very small and the window very short, but it must be taken into consideration when you're deciding on a solution.
Let's take a look at the options that will give you either synchronous or asynchronous failover availability.
Synchronous failover availability comes in two types: specialized HA software and synchronous replication.
Specialized HA software
The synchronous method involves tight integration of the database software with specialized HA software to produce a HA cluster. HA software support varies by operating system platform. The commonly available HA solutions are:
Tivoli® System Automation (TSA) - Linux - AIX
Veritas Cluster Server - Windows®, AIX, and Linux
High Availability Cluster Multiprocessing (HACMP - AIX)
Microsoft Cluster Server (MSCS) - Windows
Sun Cluster - Sun
Steeleye Lifekeeper - Linux and Windows
These are the most common options for the platforms I listed. Other HA software solutions are available as well and may be used.
All these solutions essentially work the same way. If there is a failure, the database server can move from one machine to a backup system. To accomplish this task, the HA software moves all the necessary resources to the secondary system. These resources include the disk resources of the physical database, the network resources, and the database server resources.
In the HA cluster solution, a single copy of the physical database is stored on a shared storage system. In the DB2 environment, only one system can "own" the storage array at a time. When a failure is detected, ownership of the storage is moved from the primary system to the secondary system. The network resources are moved as well. Finally, the database server resources are started on the secondary system and the database is made available.
The detection of a failure is performed by a "heartbeat" connection between the servers. This "heartbeat" is a function of the HA software and is aware of both hardware and software failures.
Since there is only a single copy of the database, it is always in sync. The time for the failover and restart of the database engine depends on several factors:
- The time needed to detect a failure
- The length of time necessary to move database resource dependencies (storage array, networking resources, etc.)
- The time required for the DB2 engine to perform crash recovery
DB2 always performs crash recovery when the database is not shut down properly. Crash recovery is the processing of the log files, making sure all committed transactions are written to disk and uncommitted transactions are rolled back. The time required to perform this operation depends upon the amount of "open" work in the database logs at the point of failure. The entire failover could take just a few seconds or longer if a large workload needs to be processed from the log files.
One advantage of this type of availability solution is that it does not require that any changes be made to the application or to the client configuration directories. The HA software provides a virtual IP address resource for database connections. The IP address will fail over when a failure is detected, and the same connect statement can be used by the application that it used before. When a failover takes place, all applications are disconnected, and the client returns a communication error condition to the application. Once the database server is running on the secondary system, the application can simply reissue the connect statement and continue to process work as before. The complimentary technology of automatic client reroute is discussed in a later section.
The secondary system does not have to remain idle while waiting for a failover. The systems can also be configured in a mutual takeover configuration where both servers are actively hosting different databases. Each machine is prepared to take over the workload of its partner in the event of a failure. Figure 3 is an example of the HA Software solution.
Figure 3. HA software availability
Table 3. HA software availability
|Database is always in sync||Extra software needed to create and configure solution|
|No changes to application or client needed||Extra skills needed to setup and manage HA software|
|No user interaction needed to detect and initialize failover||Data is not duplicated providing less redundancy|
|No performance degradation due to HA solution design||External storage required that must meet some HA standards|
|Database is always in sync||Distance limitations due to storage requirements|
Synchronous replication is available in two implementation methods, disk storage replication and DB2 High Availability Disaster Recover (HADR).
Disk storage replication
Disk storage replication utilizes special storage hardware and/or software to perform synchronous disk writes of a primary DB2 server at a secondary location. In the event of a failure at the primary site, database resources are started at the second location. The storage hardware/software must perform synchronous disk block writes in both locations or the standby location may not be in sync with the primary location. An example of this type of design utilizing IBM Peer to Peer Remote Copy (PPRC) is illustrated in Figure 4. Since all data storage writes must be performed synchronously at both locations the solution design must account for IO bandwidth needed between the primary and secondary sites.
Figure 4. Disk storage replication
Table 4. Disk storage replication
|Database is always in sync||External storage required that must meet solution standards|
|No changes to application or client needed||
Distance limitations due to storage replication requirements|
|Standby system is not available for database workload|
|All write activity must played back and in the correct order at the secondary site|
DB2 HADR and the DB2 HA feature
DB2 High Availability Disaster Recovery (HADR) is a high performance database replication system based on DB2 logging mechanics. A DB2 HADR scenario consists of two DB2 systems, a primary and a standby. The primary system performs all transactional work and replicates any database changes to the standby. In the event of a failure, the standby's role can be switched to primary with a single command.
HADR provides for synchronous replication of all logged DB2 operations. The basic requirement for two DB2 systems to participate in a DB2 HADR pair is that they have TCP/IP connectivity. Setup and administration of HADR is very simple and DB2 provides a wizard to help with these tasks.
The basic method by which HADR replicates data is to send DB2 log writes simultaneously to the local disk and to the standby system. Once the standby system receives the log write, it sends acknowledgements to the primary. The primary can consider the transaction committed once the local write IO and the acknowledgement of receipt is received from the standby. The client or application has no knowledge of this operation. The amount of data that must be sent between the systems is limited to only logged operations. Actual changes to data storage pages are performed independently on both systems -- greatly reducing the actual bandwidth required.
The overhead of running HADR is very small. The default replication method (others are available) only requires that a log write be received in memory at the standby before acknowledgement is returned to the primary. Often the simultaneous operation of performing the local write IO is greater than the transfer of data over TCP/IP to the standby. Only data changes are required to be replicated, so read transaction are unaffected by the implementation of HADR. In these circumstances, there is little to no impact of running HADR. In circumstances where you perform a large number of logged transactions, the impact of HADR is determined by the bandwidth and latency of your network between the systems. If the network bandwidth is greater than your expected logging activity you should not see a significant impact on performance.
If, in the event, the connectivity between the pair is lost you can set a timeout period for HADR to turn off. Once HADR is turned off, the primary can continue to process transactions normally. Once connectivity is restored, you can turn HADR back on and the standby will communicate with the primary and perform "catch-up" operations until it returns to a synchronized state. Figure 5 illustrates the HADR processes synchronizing a primary and standby system.
Figure 5. HADR process
The failover operation for HADR can be automated using HA software. DB2 9.5 for AIX and Linux includes the DB2 High Availability (HA) feature.
The DB2 HA feature is a highly integrated implementation of TSA and the DB2 cluster manager application programming interface (API). When you install DB2 on AIX or Linux, you have the option to install this feature which automatically installs TSA. The configuration of TSA and DB2 is implemented via the DB2 High Availability Configuration Utility (db2haicu) which greatly simplifies the configuration of a HA cluster. Another very important benefit of the HA feature is that it implements the cluster manager API. The cluster manager API allows DB2 to be cluster aware. By being cluster aware changes made to the DB2 system that would affect the HA cluster will be automatically handled. One example is stopping the DB2 engine. DB2 will make the HA cluster aware that the engine is being stopped by an administrator, and there is no need to take action to restart or transfer resources due to the stop event.
DB2 HADR also increases availability by allowing you to perform rolling upgrades. The roles of primary and standby can be switched at will as long as they are in a synchronized state. In the event you want to perform some type of system maintenance task, upgrade, or DB2 fix installation, you can turn HADR off. The primary continues to perform as normal while maintenance is performed on the standby. Once the maintenance task is complete, HADR can be turned on and the systems automatically resynchronizes and the roles can be switched if needed to allow the maintenance to be performed on the original primary.
Table 5. HADR
|Database is always synchronized||Additional server and storage requirements|
|No changes to application or client needed||Standby system is not currently available for database workload|
|Easy installation and maintenance||Non-logged operations not replicated|
|Failover automation available âout of the boxâ for AIX and Linux|
|Very low impact on performance|
|Ability to perform rolling upgrades|
DB2 replication comes in two different styles, traditional and queue-based replication. The replication options are similar in nature, but the replication methods used to produce each solution differ and have their unique strengths.
Replication is an asynchronous process and can not guarantee during a failover that all data is consistent between copies. Replication is highly configurable but is based on replicating changes to database tables from one location to another. The configurable nature of replication allows for many options including selecting only a subset of tables and/or data ranges for replication, data transformation during replication, and having multiple replication destinations. Distance is not an issue as long is there is enough network bandwidth to meet the customers needs.
Replication also allows the systems to be hosted on differing OS platforms or possibly differing database management systems. The source and targets of replication are "live," so work can be accomplished simultaneously on each system. For example, one system could be used to process transactions while a secondary system could be used to create reports or run backups. Replication is limited to user defined tables. Changes to the system catalogs can not be replicated. For example, changes to table permissions will have to be performed on all systems as replication is unable to replicate this change.
DB2 traditional replication, also known as SQL replication, is an integrated DB2 feature. It consists of two pieces: capture and apply. The replication administrator designates tables as replication sources available to be replicated, and then creates replication subscriptions on a target database, the secondary system, using the replication sources from the previous step as its source. A capture process monitors the transaction logs for all changes to the replication source tables, placing any changes made to these tables into staging tables. An apply program reads the staging tables and moves the changes to the subscription target on a timed interval.
There is some overhead associated with running traditional replication. The amount of extra work depends upon the amount of insert, update, and delete activity on the source tables. No extra locking is required on the base tables as replication only analyzes the log files and not the base tables. But the population of the staging tables (change tables) and the logging of these transactions require database resources. Figure 6 represents the basics of a traditional replication scenario.
Figure 6. Traditional replication
DB2 queue replication is similar to traditional replication; however, there are no staging tables. Instead, changes are placed directly in message queues. The queues are provided by Websphere® Replication Server and can be created on a separate server or on the primary or secondary database serer. Queues provide a high speed and assured delivery mechanism to targets. DB2 queue replication does not have the overhead of traditional replication, provides greater throughput, and adds additional redundancy due to the implementation of queues.
Figure 7. Queue replication
Table 6. Replication
|Extremely flexible||Inherently asynchronous|
|Multiple active copies||Additional performance overhead (traditional replication)|
|No distance limitation||Additional setup and management|
|Not all database changes can be replicated|
Automatic client reroute
When a failure occurs at the database server, client connections are terminated. Automatic client reroute (ACR) allows DB2 clients to automatically reconnect to the original or an alternate server which minimizes client downtime. The alternate server information for ACR is a parameter designated at the database server. The alternate server information is cached at the client after first connecting to the database server and pulling the information from the server side parameter. When a communication error is detected, ACR alternates connection attempts to the original and alternate servers. ACR does not replay in-flight transactions, and once the client connection is established, any uncommitted work will have to be performed. ACR generates a warning message for any applications using ACR to allow the application to take other actions is needed. Figure 8 shows the potential of having ACR work with a number of clients including application servers.
Figure 8. ACR
Some products, such as WebSphere middleware, are ACR-aware. This further reduces complexity and increases application availability. For more information on ACR, please see the Related topics section.
There are many options for creating well performing and highly available DB2 solutions. The material covered in this article gives you guidance to choose the best solution for your environment.
- Read the DB2 for z/OS: Data Sharing in a Nutshell Redbook to get more information on shared availability systems.
- Visit the xkoto page to learn more about multiple copy availability and data virtualization.
- Read the Building a high availability database environment using WebSphere middleware article to learn how to use DB2 HA disaster recovery with WebSphere Application Server.
- Visit the IBM DB2 9.5 Information Center to learn more about the automatic client reroute description and setup.
- Build your next development project with IBM trial software, available for download directly from developerWorks.