For any database, the first step to accessing data is making a connection from the client to the server. You may need to establish connections from clients to your IBM® DB2® Universal Database™ server across a wide area network or from the Web. If you have many connections coming in, with the potential for overloading your network or server at peak times, you may be looking at implementing a load balancing solution to spread requests across multiple servers. The flexibility of routing these connections by using facilities outside of DB2 has many potential applications. This article discusses one method of routing TCP/IP DB2 connections using an open source TCP proxy utility called Balance.
To understand how the routing utility will affect DB2 connection attempts, you will need to have an understanding of how DB2 TCP/IP connections are established. Let's examine the process from two sides, the server side and the client side.
Server side TCP/IP communication is set up on an instance-by-instance basis. In DB2 terminology, an instance is a hosting environment for one or more databases. Instance communication setup affects all databases in that instance. Essentially a DB2 instance listens to a TCP port for connection attempts and to communicate with clients. The specific TCP port that will be used is determined by a single instance configuration parameter, SVCENAME. If more than one instance exists on a single server, each must utilize a different TCP ports.
The three potential settings for the SVCENAME parameter are: it could be
blank, it could be set to a port name, or it could be set for a port
number. Setting the parameter to blank effectively disables TCP/IP
communication for the instance. If set to a port name, then a
corresponding entry must exist in the
file (operating system-specific port listing file). The
services file can be found according to the
|UNIX® and Linux||/etc/services|
Simply specifying an available port number for this parameter is also possible. To enable the instance to communicate via the configured TCP/IP port, the DB2 registry variable DB2COMM must include the 'tcpip' option.
If you look in the services file on your platform, you may find several DB2 entries. These entries may have been created by several means: initial DB2 install and setup, db2icrt command, db2isetup utility, or by an administrator. In general you will not need to modify the existing entries. The DB2 TCP/IP naming convention for client/server communication ports is:
'xxxx' is the name of the instance and
'#####' is the port number. However, no
particular form is required, and the entry for port name simply needs to
be identical to the SVCENAME parameter.
The other entries that may have the form beginning
"DB2_xxxx_#" are for other DB2 purposes and are
not related to client/server communication. Again, if you simply choose to
provide an available port number in the SVCENAME parameter, then it will
not be necessary to have a named communication port in this file.
The DB2 administration service communicates via port 523. The Control Center and Configuration Assistant, DB2 GUI administration tools, work with this service to administer DB2 instances and databases.
netstat command can be used to determine if
DB2 is listening for connections. From a shell environment or a command
window, issue the command
netstat -a to list
the TCP ports on which the system is listening. You can filter the results
by the SVCENAME parameter to determine if the instance is listening. For
SVCENAME = db2c_db2inst1
Use the command:
=> netstat -a |grep db2c_db2inst1
Figure 1. Filter results of netstat with grep
or, SVCENAME = 50100
Use the command:
=> netstat -a |findstr 50100
Figure 2. Filter results of netstat with findstr
On Windows platforms, findstr perform the equivalent to the grep function on UNIX to search for a substring.
For more information on the setup of DB2 server communications, refer to DB2 online documentation at http://publib.boulder.ibm.com/infocenter/db2help/index.jsp. The topic "Configuring TCP/IP communication for an instance" can be found by searching for "Configuring TCP/IP".
DB2 client side setup involves configuring the client to communicate with the correct port on the server. To accomplish this task, you place an entry in a special configuration file known as the node directory. The node directory entry must contain the key information hostname (IP address is also acceptable), and communication port (this may be in the form of a port name or number). There is one other entry we need to make before we are ready to connect. The database directory is another special configuration file on the client; it lists what databases are known to the client and what entry in the node directory provides the necessary information to establish a connection. We will work with both of these directories in some later examples. These configuration files are not directly manipulated by the user but are updated by means of the DB2 GUI tools or command line utilities.
Figure 3. Example of database directory entry
Figure 4. Example of node directory entry
The database directory shows us that database TEST uses node entry NODE_A. NODE_A in the node directory shows us the hosting server is gatesrv, and is listening on port 50100.
The Balance utility is a GNU General Public License utility available as a free download from http://balance.sourceforge.net. In essence it is a simple but very powerful port forwarding and load balancing tool with failover mechanisms for the Linux platform.
The utility bases its load balancing around the concept of channels. A channel consists of a host address and a port. Channels can be grouped together to create a channel group. You configure Balance to forward connection attempts on a specified port to a channel group. If more than one channel exists in a group, round robin (default) or hash load balancing is performed. If a channel of the channel group is not available, then the next channel in the group is attempted. The forwarding host post is not required to be the same port being controlled by Balance. There are other options, such as setting timeout values for connections, setting maximum number of connections, and setting affinities between incoming IP address connection attempts and destination channels.
To illustrate the capabilities of Balance and DB2, I will use an example scenario, illustrated in Figure 5. While the example will not be exhaustive it should allow you the basic feel of the combined products' capabilities.
Figure 5. Example Balance configuration
In the example I have two clients, clientA and clientB, that are connecting to databases hosted by three different servers, db2serv1, 192.168.1.101, and db2serv3. The gatesrv server is providing port forwarding for port 50100 to the three servers. The gatesrv server requires no additional software to perform this task other than the Balance utility. It is also not necessary to run the application on a dedicated server but I've used a dedicated server here for clarity.
The clients are connecting to a database named test. How to communicate to the sample database is directed by the node directory entries, NODE_A on client A and NODE_B on clientB. I chose differing node entry names for illustrative purposes only; they could be the same name and likely should be for client setup simplicity. The node entries point to gatesrv server, rather than to a traditional database server. The clients are configured to connect to port 50100 on gatesrv. The server gatesrv has three defined channels, one for each server, creating a channel group. It will direct connection attempts to the three servers.
The servers differ in several ways, including operating systems, instance names, host names, and TCP listening ports; again, this is for illustrative purposes. The only necessary difference is hostname or IP address designation, naturally to differentiate one server from another. The example also illustrates that each server is hosting separate databases. This would not be the case for a DB2 Connect gateway implementation, which I'll discuss later. Connection attempts from the clients can either be round robin routed or hash routed. Round robin would distribute connection attempts evenly among the channels, while hashing would use the client's IP address to determine the destination channel. Hashing allows client connect attempts to always be routed to the same server.
Hosting separate databases is an important issue. If the connecting applications are performing insert, update, or delete activity, then all the databases will not have the same content. If they are read-only databases then this would not be an issue. If you require the databases to be synchronized then you must plan for this process. There are many methods to accomplish synchronization such as DB2 replication. You must be aware of or account for the asynchronous nature of replication if this is your chosen method.
The steps to create the above example are very simple and are provided here. All DB2 commands should be performed under the instance ID, or another member of the SYSADM group, and as the root user for the Balance utility. We will be using the DB2 Command Line; however this could also be accomplished by using the GUIs Client Configuration Assistant (for client configuration) and Control Center (for server configuration).
On the clients:
=> db2 CATALOG TCPIP NODE node_a REMOTE gatesrv SERVICE 50100 => db2 CATALOG DATABASE test AS test AT NODE node_a
=> db2 CATALOG TCPIP NODE node_b REMOTE gatesrv SERVICE 50100 => db2 CATALOG DATABASE test AS test AT NODE node_b
To check the entries use the two following commands
=> db2 LIST NODE DIRECTORY => db2 LIST DATABASE DIRECTORY
On the servers:
=> db2 UPDATE DBM CFG USING SVCENAME 50100 192.168.1.101 => db2 UPDATE DBM CFG USING SVCENAME 50100
=> db2 UPDATE DBM CFG USING SVCENAME 60000
(Don't forget to restart DB2 after changing this parameter)
On the gateway:
=> balance 50100 db2serv1 192.168.1.101 db2serv3:60000
In reality without the variations that I made to demonstrate the configuration options this could be simplified to the same two commands for all the clients, the same command for all the servers and the single command for Balance.
One natural fit for the Balance application and DB2 is to load balance DB2 Connect gateways. DB2 Connect is unique in that it makes DB2 host (zSeries® and iSeries™) data available to distributed systems. While the DB2 Connect gateway does not host the data, it does perform translation work on behalf of the connecting clients. If you wish to load balance this work or provide an alternate path for connecting applications, then the Balance utility may suit your needs. The illustration below shows how this might look.
Figure 6. DB2 Connect in a gateway load balancing configuration
In the event one DB2 Connect gateway was unavailable, Balance would successfully route connections through the alternate. To implement true high availability, the Balance application would need to be made highly available. This would not be a difficult task but is not covered here.
Routing connection attempts away from a server is another possible benefit of using Balance to manage your DB2 connections. Since the Balance program has the option to be configured dynamically, removing or adding servers from Balance’s routing table will not affect connections to other servers. This might be useful for performing system maintenance tasks.
Figure 7. Client routing during maintenance
The dotted lines suggests that I can remove ServerB after adding ServerA, or I can simply remove ServerB and add ServerA to quickly reroute connection attempts. If I remove routing to a server, all connections to that server are lost.
Balance is primarily used as a tool to load balance http requests. As clients make requests for Web pages, a farm of servers can serve up the necessary content. The same principle could be applied to database hosting. This may be especially effective when the database is or is largely read-only. One example may be a travel Web site where there is a very large "look to book" ratio. That is, people may perform a great many searches for accommodations but make reservations very infrequently. One issue to be aware of is that you must keep databases synchronized when bookings are made, or create a separate process for the reservation system.
Figure 8. Balance Web requests
DB2 Version 8.2 introduced a new feature, client reroute, enhancing DB2's native client/server communication abilities. To implement this feature you will need DB2 Version 8.2 clients and servers. The essential function of DB2 V8.2's client reroute capability is to move a database connection from one sever to another. This is performed transparently to the user or application.
More information on this release of DB2 can be obtained at http://www.ibm.com/software/data/db2/udb/v82/.
Another interesting feature of DB2 is the ability to "hop" from one database server to another. The database server is usually the final destination of client connection attempts. On occasion the database administrator may wish the clients to use a different database server. Changing the clients' configuration may be either impossible or impractical. I have demonstrated how this might be handled using Balance. DB2 can accomplish a similar function by the database server pointing to an alternative hosting database server. Accomplishing this "hop" simply requires an entry in the DB2 database directory for the alternative host in place of local entry.
Figure 9 shows the database directory for a database, Sample, hosted on the local machine noted by the "indirect" entry type.
Figure 9. Database directory entry for a local database
If you wanted Sample to be hosted on another server, you need the entry type to be "remote". You will need to create a TCP/IP node entry for the remote server in the node directory, uncatalog this database directory entry, and add remote database directory entry, as shown in Figures 10, 11, and 12.
Figure 10. Catalog the TCP/IP node
Figure 11. Uncatalog the local database
Figure 12. Catalog the database at the remote node
The Balance utility provides additional flexibility when developing DB2 solutions. While the use of this tool does not fit all data management scenarios, it does provide a powerful means of distributing data workload. When you evaluate the tool, don't forget to test for data synchronization and Balance availability if a high availability solution is required. You will find this to be one more open source tool that can be used to add power and flexibility to your database solution.
Monty Wright is a certified IT specialist and has been supporting the DB2 brand for 7 years. He has published several articles on DB2 distributed topics and authored the DB2 performance tuning section of the IBM Redbook Tuning IBM eServer xSeries Servers for Performance. As part of the IBM Advanced Technical Support team, he has participated in numerous customer engagements and benchmarks. His specialty areas include performance tuning, database partitioning, and high availability.