What is the typical scenario for your connected client applications when an IBM® DB2® Universal Database for Linux®, UNIX®, and Windows® (DB2) database crashes? Prior to Version 8.2, the DB2 database server would typically return a "SQL30081N A Communication error has been detected" error to each client application (including local ones) that is connected to the server. This typically leads to a termination of the application logic routing, and ultimately leads to an application error being displayed to the end user.
Such an outage causes a ripple effect throughout the infrastructure. End users (who could potentially be your customers) can't do their work or complete their transactions, which results in dissatisfaction and potential consumer vulnerability issues. From a database administrator's perspective, any error that surfaces to the end user will likely violate some service level agreement (SLA) that the Information Technology (IT) department has contractually signed with a line of business. In fact, many DBAs remark that it's the surfacing of the error that they really care about. They've spent countless hours ensuring that they have the correct availability plan in place so that if a failure occurs, a standby server will be able to service the workload in seconds. If they could hide the surfacing of this communication error, end users would typically only experience a minor delay and would be "none the wiser." Perhaps it has its roots in the old song "What you don't know won't hurt you" composed by William Cary Duncan. That's likely true in this example. If you've got some sort of failover plan like High Availability Disaster Recovery (HADR) set up such that in the event of a server failure you're back in action within seconds, do users care?
While you could handle this error at the application logic layer, before the Automatic Client Reroute facility, such solutions involved high cost of ownership and were more prone to errors. For example. you could maintain a list of backup servers and build retry logic into the application. The question in this scenario becomes: How do you change backup servers? If you're distributing database connections across a network to database hubs (for example, geographic routing), what servers to you code into the application to connect to in the event of a failure? What happens if you want to change the standby server address? In this approach, there are literally dozens of application-dependent considerations which generally make it a non-usable option. For these reasons and more, DBAs wanted a better way -- and DB2 V8.2 delivers with the Automatic Client Reroute facility.
This article discusses the details of the new Automatic Client Reroute facility, how it can help IT adhere to stringent SLA agreements, and, more importantly, how you can use it to address end-user and consumer satisfaction during an outage.
Automatic Client Reroute allows client applications and their respective connections to be transparently transferred to an alternate database if the connection to the primary database is lost. When this facility is enabled, applications will no longer receive or be forced to expose the SQL30081N error.
For example, if a primary HADR-enabled database crashes without Automatic Client Reroute, a DBA would have to manually (or have the application) re-establish all broken connections, so that they will be connected to the new primary database (the old standby). With Automatic Client Reroute, you can avoid this step and provide more transparency to your applications by having this connection change handled by DB2.
Quite simply, the Automatic Client Reroute facility translates into higher satisfaction usage rages, lower complexity, and lower administration, and what DBA doesn't want all of those?
Who can use Automatic Client Reroute?
You can use Automatic Client Reroute for connection error handling as long as the DB2 client and server are at the V8.2 level or later. The client, primary server, and standby server don't have to be at the same post-V8.2 version, so once you've addressed this requirement, the two DB2 servers' maintenance levels do not have to match. This provides IT shops with a lot of granularity once they've moved to the V8.2 release (which I strongly recommend by the way -- this point release is loaded with some serious features, especially around the autonomic management area and performance).
The Automatic Client Reroute facility can be used in almost any high-availability configuration supported by DB2 (not just HADR, which is the context in which it will most often be discussed), including:
- A partitioned or non-partitioned database cluster
- Data Propagator® (DPropR)-style replication (SQL replication)
- A Q-replication environment
- High-availability clustering software, such as IBM High Availability Cluster Multiprocessor (HACMP™), Microsoft® Clusters Services (MSCS), Tivoli® System Automation (TSA), and so on
- An HADR environment. Automatic Client Reroute, in conjunction with HADR, gives users the ability to continue working without a noticeable interruption. These two components of the DB2 V8.2 release are likely the most talked about when it comes to high availability.
- A DB2 Connect environment. For example, if one of the DB2 Connect servers goes down, Automatic Client Reroute can transfer client applications that were routing through that server for DB2 to another DB2 Connect server.
- A DB2 for z/OS SYSPLEX environment. For example, if you're running a DB2 Connect server that leverages all the advantages of a SYSPLEX environment for data sharing groups with failover support, your shop can benefit from this feature in addition to simply getting a failover to an alternate DB2 Connect server.
- You!
How Automatic Client Reroute works
When your environment is enabled for Automatic Client Reroute, the DB2 client will automatically retry the original server location, and then an alternate server location that you specify on the DB2 server itself. The coded-in retry of the primary server (which presumably has experienced an outage) is to ensure that indeed there was an outage and not some sort of momentary event which makes the primary server appear to be down. For example, irregular network latency could have triggered the identification of a failed server.
When the Automatic Client Reroute facility is configured, the built-in retry logic will alternate between the original primary server and the standby server (often referred to as the alternate server in an Automatic Client Reroute environment) location for 10 minutes, or until a database connection is re-established.
Specifically, the retry logic built into this facility in DB2 will:
- Try to re-establish a connection to the original primary server to ensure there is no "accidental" failure.
- Alternate connection attempts between both the (now down) primary server and the standby server every 2 seconds for 30-60 seconds.
- Alternate connection attempts between both the primary server and the standby server every 5 seconds for 1-2 minutes.
- Alternate connection attempts between both the primary server and the standby server every 10 seconds for 2-5 minutes.
- Alternate connection attempts between both the primary server and the standby server every 30 seconds for 5-10 minutes.
- Make the connection during stages 1-6, or return the
SQL30081Nerror code to the client application.
The following figure summarizes this procedure:
Figure 1. The retry logic built into the Automatic Client Reroute facility
If the connection retry sequence is successful in re-establishing a connection, the SQLCODE -30108 is returned to the application to indicate that a database connection has been re-established after the confirmed communication failure. The host name (or IP address) and service name (or port number) are also returned -- depending what information you used to catalog the database connection entries. In the end, the DB2 client will only return the error from the original communication failure if ACR fails to recover from it.
Setting up Automatic Client Reroute
The main goal of Automatic Client Reroute is to enable DB2 applications to recover from a loss of communications so that the application can continue its work with minimal interruption -- and doing all of this with simplicity in mind.
As the name implies, rerouting is central to this feature. However, rerouting the application to a live database server is only possible if there is an alternate location of which the client connection is aware.
Setting an alternate database server location on the primary server enables ACR to reroute to the standby system in the event of a failure. When the connection is re-established, the application receives an error message about the transaction failure, but the application can continue and handle the transaction error programmatically.
If a client application is to transparently connect to an alternate standby server while recovering from a loss of communication with the failed primary server, you need to specify that alternate server's location. With Automatic Client Reroute and DB2 V8.2 (or later), you do this using the new UPDATE ALTERNATE SERVER FOR DATABASE command in the primary server. The following figure shows an example of the steps that are required to catalog an alternate server:
Figure 2. Steps to set up the Automatic Client Reroute facility on the primary database server
When you set up an HADR environment using the HADR Setup Wizard, the steps to enable the Automatic Client Reroute facility between the two servers is automatically done for you.
The alternate server information is stored on the primary server, and loaded into the client's cache upon a successful connection to the primary server. This means that in order for a client application to know the standby server, it must first successfully connect to the primary server.
This architecture provides a central management control point for cataloging the standby server -- which ultimately delivers simplicity; this is in sharp contrast to other technologies like Oracle's Transparent Application Failover (TAF), where the configuration typically takes place on the client.
For applications that don't use a DB2 client to connect to the primary database (for example, applications using the DB2 Universal Java Client with a Type 4 JDBC connection), the alternate server information is stored in a special register. This means that no matter how you connect to a DB2 server, you can leverage this capability (unlike SQL Server 2005's version of this feature).
After the DBA specifies an alternate server location for a particular database and server instance, the alternate server location is returned to the client at connection time. If communication is lost for any reason, the DB2 client code will be able to re-establish the connection using the alternate server information that was returned from the server. It is important to note that because no server state is maintained across connection failures, a unit of work that is in progress will have to be submitted again.
The alternate server location is maintained on the server (where it persists in the system database directory file) and on the client. The alternate server location information (host name or IP address, and service name or port number) that is returned to the client at connection time is kept in the system database directory file and cached in local memory. If the alternate server location is changed on the server, the client will pick up the change at the next successful connection.
An example of Automatic Client Reroute
The following figure shows a typical Automatic Client Reroute scenario. Map the description that follows to the number position on the figure:
Figure 3. An example of Automatic Client Reroute
- The DBA creates a database and updates the alternate server location using the UPDATE ALTERNATE SERVER FOR DATABASE command.
- The remote database is cataloged on the client, or in a Lightweight Directory Access Protocol (LDAP) directory that is supported by DB2 (for example, Active Directory, IBM's Tivoli Directory Services, and so on).
- The client tries to connect to the remote database.
- After a successful connection, the alternate server location is returned back to the client and saved in the system database directory and the local directory cache.
- The application performs its work. It interacts with the database using SQL and returns result sets to the client. Now let's assume an outage occurs during an INSERT operation. Automatic Client Reroute checks whether or not the alternate location can be found in memory (and it will be found, assuming it made the successful connection and the server is configured for Automatic Client Reroute). If it is found, ACR will first retry the connection to the failed server and if that fails, it will try to connect to the alternate standby server and the primary server in a ping-pong fashion as previously discussed.
- The client connection is re-established with the alternate server. The transaction is rolled back and then re-issued on the alternate server (assuming that this is how the application was programmed to handle this type of error -- this is typically what the Retry button on all those Web sites are doing).
Other Automatic Client Reroute considerations
If you are using a Java client (for example, a Type 4 driver) to connect to DB2, there is no directory file that can be populated with the alternate server's location. In the case of a Type 4 driver, the new alternateDataSource property can be used to store an alternate server's location (it contains the alternate standby server's JNDI location).
The alternate server information for a Type 4 JDBC driver is dynamically copied from the server to the client at connection time, and persists in the driver's static memory.
After a failover connection is re-established, the JDBC driver will throw a java.sql.SQLException to the application with a SQLCODE -30108, which indicates that a failover has occurred and that the transaction has failed. The application can then recover and try the transaction again.
Automatic Client Reroute also supports LDAP. In an LDAP environment, database and node directory information is maintained at an LDAP server. Clients retrieve information from the LDAP directory. For performance reasons, customers often enable the DB2LDAPCACHE registry variable to cache this connection information in the client's local database and node directories. When using ACR with LDAP, there are some additional points that the DBA should consider:
- The
UPDATE ALTERNATE SERVER FOR DATABASEcommand has been extended for an LDAP environment. You should use these extensions when cataloging the database. - Another way to specify an alternate server's location is to use a DNS entry to specify the alternate server's IP address. In this scenario, the client would not know about an alternate server, but at connect time, DB2 UDB would alternate between the IP addresses returned by the
gethostbyname()function.
Although IT infrastructures have matured so much that the mean time between failures is continually on the rise, failures do happen. With so much focus on the "application experience," DBAs are faced with challenges to maintain connectivity. A database that has the ability to transparently handle outages has multiple benefits across your enterprise's IT ecosystem. Automatic Client Reroute is just another feature that makes DB2 V8.2 a "must try" technology that is bound to make you smile.
The information in this article is submitted as a best-effort basis as the author understands it, and does not represent an official communication from IBM. Neither IBM nor the author is responsible for the information in this article.

Paul C. Zikopoulos, is an award-winning writer and speaker with the IBM Database Competitive Technology team. He has more than ten years of experience with DB2 UDB and has written over sixty magazine articles and several books about it. Paul has co-authored the books: DB2 Version 8: The Official Guide, DB2: The Complete Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A DBA's Guide to Databases on Linux. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Cluster/EEE) and a DB2 Certified Solutions Expert (Business Intelligence and Database Administration). In his spare time, he enjoys all sorts of sporting activities, running with his dog Chachi, and trying to figure out the world according to Chloe, his new daughter. You can reach him at: paulz_ibm@msn.com
Comments (Undergoing maintenance)





