For the purpose of accessing Oracle databases, Oracle Real Application Cluster (RAC) can be used for both high availability and workload management. The Oracle JDBC driver can be configured to provide both of these Qualities of Service in IBM WebSphere Application Server environments. Unfortunately, using the 10g version of the Oracle JDBC driver for workload management among multiple Oracle RAC nodes can open the door to a subtle two-phase commit (2PC) distributed transaction recovery problem.
After a brief introduction to Oracle RAC, this article looks at the reason behind this transaction recovery issue, and details a simple Oracle RAC configuration for WebSphere Application Server environments that will avoid transaction recovery problems.
About Oracle RAC
Oracle RAC is a "share- everything" architecture for database high availability and load balancing in which two or more Oracle RAC nodes are clustered together, sharing the same storage (Figure 1).
The RAC nodes are connected together with a high speed interconnect that enables fast communication between the Oracle nodes to exchange various categories of data block ownership information during startup, lock information, transaction information, data, and so on.
Figure 1. Oracle RAC configuration
In an Oracle RAC environment, every Oracle data block is assigned one (and only one) "Master" Oracle RAC node. The Global Cache Service (GCS) of that Oracle RAC node is responsible for managing access to these sets of data blocks. When one of the Oracle nodes wants to get access to an Oracle data block, it must first check with the data block master. The master node's GCS will then either instruct the requesting Oracle node to get the data block from disk, or instruct the current holder of the Oracle data block to send the requested data block to the requesting node. Oracle attempts to distribute the data block's ownership uniformly across all RAC nodes. All nodes in an Oracle RAC environment will be designated as the master of approximately the same number of data blocks. (All the Oracle RAC nodes will be the master of identical number of data blocks, if the number of RAC nodes is a divisor of the number of Oracle data blocks.)
WebSphere Application Server applications using an Oracle database obtain database connections from the Oracle JDBC driver. The Oracle JDBC driver can be configured for failover or load balancing (or both) in an Oracle RAC environment. If the JDBC driver is configured with failover enabled, then in case of a RAC node failure, the WebSphere Application Server applications that obtained connections from the failed node should be able to switch over transparently to the other running RAC nodes. (For transparent switch over, applications can be coded to catch the StaleConnection exception and retry a specified number of times to obtain new connection handles. On a request to obtain connections, the Oracle JDBC driver will provide physical connections from the running RAC nodes.) The JDBC driver will obtain new connections from the running RAC nodes. In fact, in case of a failure of one or more RAC nodes, properly coded WebSphere Application Server applications should continue to function in an almost uninterrupted way, provided RAC is configured for high availability and, of course, some RAC node are active. (A momentary delay may get noticed in application response, since as part of the failover process, Oracle transfers and uniformly distributes the data block owned by the failed RAC nodes to other running RAC nodes. During this rebuilding process, Oracle will freeze the database for a short while.)
In addition to high availability, Oracle RAC can also be configured to provide workload balancing; workload balance to the database happens at the connection level. Oracle distributes the connection requests to RAC nodes using a random algorithm. For each connection request, the Oracle JDBC driver randomly picks up a RAC node for obtaining connections. In general, such a random distribution algorithm uses all the RAC nodes in an uniform way. For example, if 100 database connection requests are made in a two-node Oracle RAC environment, each RAC node will be used to obtain approximately 50 database connections.
Simple RAC configuration in a WebSphere Application Server cluster
Figure 2 depicts a typical RAC physical topology in a WebSphere Application Server cluster environment, where both the failover and load balancing Qualities of Service are enabled.
Figure 2. WebSphere Application Server environment with Oracle RAC
The WebSphere Application Server cluster consists of two members: cluster-member1 and cluster-member2. The Oracle RAC physical configuration contains two nodes: rac-node1 and rac-node2. The RAC nodes can be located in the same physical machine with the WebSphere cluster members, or they could be placed in entirely different machines, as in Figure 2. (The actual placement does not impact the fundamental qualities of the services provided by RAC.)
For achieving both high availability and load-balancing, you can specify the Oracle data source URL for both WebSphere cluster members as something like:
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST= rac-node1)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST= rac-node2)(PORT=1521)) (FAILOVER=on)(LOAD_BALANCE=on) (CONNECT_DATA=(SERVER=DEDICATED) (SERVICE_NAME=<dbname>))) (1)
Since the FAILOVER attribute is set, if any RAC node (for example, rac-node1) becomes unavailable, the oracle JDBC driver will return connections from the other surviving Oracle RAC nodes when a new connection is requested (in this case, rac-node2).
Also, since the LOAD-BALANCE attribute is specified "on", the Oracle JDBC driver will serve connections to application instances running in cluster-member1 and cluster-member2 from both rac-node1 and rac-node2. Recall that Oracle will attempt to obtain an approximately equal number of physical connections from rac-node1 and rac-node2.
Notice that If the LOAD_BALANCE attribute is turned off, the order in which the RAC nodes are specified in the data source URL becomes important. The Oracle JDBC driver always tries to obtain physical connections from the RAC node that occupies the first position in the list. In case the first RAC node is not available, the JDBC driver will try to obtain connections from the second RAC node, and so on. This implies that for URL (1) (in Listing 1), the URL of cluster-member1, rac-node1 is the primary RAC node. As long as rac-node1 remains functional, all physical connection requests will be served from rac-node1; rac-node2 will not play any part at all. If rac-node1 fails, Oracle will serve the connection requests originating from cluster-member1 from rac-node2, which is a secondary RAC node of URL (1). All new connection requests from cluster-member1 will, of course, be served by rac-node2 as long as rac-node1 is down.
In this context, the complete workload management in Figure 2 is provided at two levels. At the WebSphere Application Server level, the work load management is provided by the router component (HTTP server plug-in, ORB plug-in, On Demand Router, and so on) by distributing inbound requests to appropriate cluster members, maintaining appropriate affinity, if applicable. Once an inbound request reaches the application instance executing on a cluster member, the new physical connection requests to the database will get load balanced by the Oracle JDBC driver across rac-node1 and rac-node2.
If the applications do not use 2PC distributed transactions, the above simple configuration will suffice. However, there can be some subtle recovery issues when 2PC distributed transactions are used in this simple and easy to configure setup, as explained in the next section.
Two-phase commit distributed transactions and RAC configuration
When recovering 2PC distributed transactions that span over multiple Oracle RAC nodes, Oracle 10g RAC has an inherent issue: during an attempt to commit or rollback a transaction branch (through an active RAC node) that was running earlier in a failed RAC node, the transaction manager might receive an "ORA-24756: transaction does not exist" exception. This problem occurs because, in the case of an RAC node failure, Oracle opens up the other surviving RAC nodes for business even before Oracle RAC completes the necessary recovery action corresponding to the failed RAC node.
Here is a hypothetical scenario that depicts this problem. Consider a two-node Oracle 10G RAC environment (with nodes rac-node1 and rac-node2) in which both failover and load balancing are enabled:
- Application starts a User transaction (called UTx) in a WebSphere Application Server cluster member.
- Within the boundary of UTx, the application requests a database connection. Assume the connection is served from rac-node1.
- The application requests another connection to the same Oracle database resource manager, which is served from rac-node2, which is the other RAC node.
- Work is performed on both connections, which results in the start of two transaction branches: TxB1 and TxB2.
- Assume that a RAC node (rac-node1) fails after the WebSphere Application Server transaction manager has issued the xa.prepare() call on TxB1, but before the invocation of xa.commit().
- The WebSphere transaction manager will receive a XAER_RMFAIL return code (see Resources), which indicates that the resource manager is unavailable.
- The WebSphere transaction manager then attempts to get a new connection to RAC. Since failover is enabled, the Oracle JDBC driver will provide the connection from rac-node2 after recovery is done on the RAC side (that is, Oracle RAC is open for business).
- The WebSphere transaction manager will attempt to finish TxB1 by issuing xa.commit().
- The Oracle JDBC driver might throw the ORA-24756: transaction does not exit exception; rac-node2, the surviving RAC node, might not even be aware of the transaction branch TxB1.
- TxB1 will stay in an indoubt state, which will cause problems if TxB1 happens to hold any database locks.
- Any subsequent attempt to access the same database rows or tables locked out by the TxB1 branch will result in an "ORA-01591 lock held by in-doubt distributed transaction" exception, thereby making portions of the database unusable.
To summarize, if WebSphere Application Server is configured for connection level load balancing using the Oracle JDBC driver, then the failure of a RAC node in the prepare phase for any in-flight 2PC distributed transaction spanning more than one RAC node might require manual intervention from the database administrator. The database administrator might have to complete the orphaned transaction branches for releasing the database locks that they hold. For the database administrator, this is a two step process:
- Get the orphaned transaction ID by issuing a command, such as:
sql > select state, local_tran_ID, Global_tran_Id from dba_2pc_pending where state = "prepared";
- Then, roll back all the transaction IDs that are in the "prepared" phase:
sql > rollback force '<Global_tran_Id>';
Proper RAC configuration for handling 2PC global transactions
Manual intervention might not be your preferred method of transaction recovery; some users will opt for automatic and transparent transaction recovery strategies. The key to a transparent recovery strategy is to eliminate the possibility of having more than one transaction branch span multiple RAC nodes for any global transaction; a transaction branch corresponds to a database connection enlisted in a global transaction. If all connections in a global 2PC transaction originate from the same Oracle RAC node, transaction recovery problems should not arise.
Here is an outline of the simplest strategy for avoiding the XA-related limitation of Oracle RAC mentioned above:
Create an Oracle Distributed Transaction Processing (DTP) service (introduced in Oracle 10g R2 to resolve the Oracle 10g XA limitation); each DTP service is a singleton service that is available on one (and only one) Oracle RAC instance. Run this command to create a DTP service:
srvctl add service -d <dbname> -s <servicename> -r <primary nodes> -a <alternate nodes>
Enable DTP on an Oracle service by executing this command:
execute dbms_service.modify_service (service_name => '<service_name>' , dtp => true);
Make sure each Oracle DTP service is configured with load balance disabled (that is, only one primary).
Configure each WebSphere Application Server cluster member to use the Oracle DTP service created in step 1 (Figure 3).
Figure 3. Configure cluster to use DTP service
The DTP service will start automatically on the preferred instance. However, on a database restart, the DTP service will not start automatically. You can start the DTP service using this command:
srvctl start service -d <dbname> -s <service_name>
If a RAC node stops working, Oracle will not failover the DTP service until the Oracle RAC cleanup and recovery is complete. Even if the Oracle nodes come back up, the Oracle DTP service will not return to the freshly restarted RAC node. Instead, the database administrator will have to manually move the service to the restarted RAC node.
Figure 4 depicts the topology discussed above.
Figure 4. Sample scenario topology
Notice that, from a topology standpoint, Figure 4 is identical to Figure 2. The only difference is the use of the Oracle DTP service . Unlike URL (1) (in Listing 1) where all the cluster members have identical data source URLs, here, each cluster member's data source URL can be unique and slightly different from those of the other cluster members. As an example, for thin drivers, the data source URL for the WebSphere Application Server cluster members may look like this:
jdbc:oracle:thin:@(DESCRIPTION= (FAILOVER=on) (LOAD_BALANCE=off) (ADDRESS=(PROTOCOL=TCP)(HOST=rac-node1)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=rac-node2)(PORT=1521)) (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=<yourOraservice1toBeUsedForCluster1>))) (2) and jdbc:oracle:thin:@(DESCRIPTION= (FAILOVER=on) (LOAD_BALANCE=off) (ADDRESS=(PROTOCOL=TCP)(HOST=rac-node2)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=rac-node1)(PORT=1521)) (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=<yourOraservice2toBeUsedForCluster2>))) (3)
For thick drivers, the URLs will be very similar:
jdbc:oracle:oci in place of
jdbc:oracle:thin in the above URLs.
When Oracle services are used, load balance and failover must be specified on the Oracle service definition itself, not on the URL. Specifying the load-balance or failover in the URL when Oracle services are used will not affect the RAC nodes' load balance or failover. Instead, it will affect the Oracle listener load balance and failover.
In Figure 4, the entire load balancing happens at the WebSphere Application Server level, through the router. The relevant WebSphere router maintains the transactional affinity, and all the requests belonging to a transaction are routed to the same WebSphere cluster member. Since the Oracle JDBC driver is not providing any load balancing, all of these inbound requests will open and use connections from the same RAC node. This prevents any 2PC distributed transaction to span across more than one RAC node, thereby avoiding the transaction recovery problem.
Since the typical router configuration attempts to spray requests uniformly across all WebSphere Application Server cluster members, if each RAC node is configured to be a primary node for some cluster member, the database load will also get distributed across all the RAC nodes. By tuning the static weights of the relevant WebSphere routers (the cluster member weights for both HTTP and Universal Clustering Framework routers), if applicable, the WebSphere system administrator might be able to approximately achieve the desired amount of skew in the database load distribution among RAC nodes.
A few points to be noted in this context, where load balancing is turned off:
- The primary and secondary RAC nodes are specified positionally in the data source URL.
- There can be multiple secondary RAC nodes. However, only one RAC node, the first functional node in the list of RAC nodes in a URL like (2) will be used to obtain physical connections. This node can be called as principal secondary.
- A simple way to achieve uniform database load balancing across all the RAC nodes will be to satisfy the following two conditions:
- The applicable router(s) should be configured in a way such that all the WebSphere cluster members receive an equal proportion of traffic.
- Each RAC node should be configured to serve as the primary, and as the principal secondary of equal number of WebSphere cluster members. In Figure 4, both RAC nodes are configured to be primaries and principal secondaries of one WebSphere cluster member (rac-node1 is the primary to cluster-member1 and principal secondary to cluster-member2; rac-node2 is the primary to cluster-member2 and principal secondary to cluster-member1).
The above condition is easy to satisfy in a topology where the number of RAC nodes is a divisor of the number of WebSphere cluster members.
Using the "let WebSphere Application Server do the load-balancing" strategy described in this article can make the WebSphere clustered environment resilient against problems related to transaction recovery. The URLs illustrated in this article can also be used in a WebSphere Process Server environment for automatic and smooth 2PC distributed transaction recovery, should there be a RAC node failure.
The primary-secondary RAC node configuration is equally applicable for standalone WebSphere Application Servers. However, for standalone servers, though RAC will provide failover at connection level, there can be no load balancing at all.
More in this series
- Handling data access exception - StaleConnectionException
- Oracle Real Application Clusters 10g: An Oracle Technical White Paper (PDF)
- Best Practices for Using XA with RAC, Version 2.0 (PDF)
- Real Application Clusters (Oracle Base)
- Distributed Transaction Processing: The XA Specification (PDF)
- Oracle glossary
- IBM developerWorks WebSphere Application Server zone