The first article of this series described how client optimization can improve database application performance and security, and how you can enable client optimization on a single application server node. This second article briefly introduces the concept of clustering, explains how to define a data source and a pureQuery-enabled JDBC provider in a clustered environment, and then describes how to configure and work with client optimization in clustered application server environments, specifically, clustered WebSphere Application Server (WAS) environments.
As you may recall, the first article described how client optimization consists of the following main phases:
- Capture the SQL from the running application.
- Configure the captured SQL to specify target package names.
- Bind the configured capture file into DB2 packages, if the target database is DB2.
- Execute the application with the captured SQL to get desired behavior, such as static execution.
This second article focuses on extensions to the capture phase that enable you to accommodate a clustered environment. The article describes how to configure the JDBC provider, data sources, and the pureQuery properties to enable capturing SQL when multiple WebSphere instances are running the same application in a cluster. This basically means that you create several output capture files. The other part of the process that the article describes is how you can merge the captured SQL into a single file that you can then configure and bind to the target data source.
Clustering, as the name suggests, refers to a logical collection of application server instances running the same application. The purpose of clustering is to provide workload management and high availability. If one instance of the server goes down, another instance can take over the workload. The individual server instances in a cluster environment are referred to as cluster members.
To use clustering with WebSphere Application Server, you need to have the WebSphere Application Server Network Deployment edition. This edition of WebSphere includes a profile called a deployment manager, which is a special WebSphere instance. The deployment manager enables you to create and manage clusters.
Members of a cluster can be from the same or different nodes and can be on the same or different machines. A node is a logical grouping of application server instances that share the same configuration control and are generally associated with one physical installation of WAS. Each node uses an agent to communicate with the deployment manager. The deployment manager profile allows you to manage the configuration of multiple application servers across multiple nodes.
There are three types of WAS clustering environments. The different types vary in regard to how and where the cluster members are located:
- Horizontal clustering — Various instances of the server or cluster members are on different nodes.
- Vertical clustering — All cluster members are on the same node.
- Mixed clustering — This is a mix of both horizontal and vertical clustering.
Figure 1 shows an example of a mixed clustering environment. On Node 1, the cluster members named clus1_1 and clus1_2 are an example of horizontal clustering on one node. Similarly, Node 2 has two cluster members named clus2_1 and clus2_2 that are horizontally clustered. Because Node_1 and Node_2 are also vertically clustered, the environment shown represents mixed clustering.
Figure 1. Example of a mixed clustering environment
To learn more about clustering, refer to the Device Manager Information Center link in the Resources section of this article.
When you install an application in a mixed cluster environment, the application is installed on all the cluster members at the same time. You can stop, start, and administer a cluster as a single entity, in the same way you would do so with a single server.
This article assumes that all nodes in the cluster environment are managed by the deployment manager. This means that any property settings made at the node level are propagated to all of the environment's server members. The article also assumes that the Web server is part of the application server.
Creating a pureQuery-enabled JDBC provider and data source on the cluster
When you create a JDBC provider in a clustered environment, you need to make sure that the scope of the JDBC provider is cluster and not the individual instance of the server profile. A scope for a resource (such as the JDBC provider or data source) defines its visibility in the application server environment. For example, a scope defined at the cluster level means all the cluster members can see the resource. Similarly, a scope defined at the node level can be seen by all the server instances in that node. You can define the scope when you create the JDBC provider and data sources.
You create the data source in a clustered environment in the same manner as you do in a single-server environment, except that you define the scope for the data source as cluster. Again, this is so that the data source is visible to all the cluster members. Figure 2 shows how to use the Data sources page of the Integrated Solutions Console to select the scope of a new data source. After you choose to show the scope selection drop-down list, the list shows all the available nodes, clusters, and the application server instances.
Figure 2. Define the data source scope as cluster
Refer to the first article in this series for instructions on creating the data source.
Once you have created the data source, you can set the data-source-level pdq properties in the same way you do for a single server. The properties are then applicable to the cluster. Everything described in the first article in terms of the properties for a single-server environment applies as well in a clustered environment.
Configuring client optimization in a clustered environment
As described previously, when you install an application on a cluster, it is installed on all the cluster members, and all these cluster members have the same view of the back-end database (data source). When you enable client optimization in such an environment, it increases the possibility of resource contention. This is because of the increased likelihood that multiple instances of the same application from different cluster members will be running in capture mode and may try to write to the same capture file.
To avoid the resource contention problem, a new file name suffix is used.
The suffix is $X.
During the SQL capture phase,
you can provide a capture file name (using the outputPureQueryXml property) that includes
the $X suffix.
For example: pdq.outputPureQueryXml=App1$X.pdqxml
During the capture process of client optimization, the $X is replaced
with the timestamp and the object ID of the application classloader when the file is first created.
This ensures creation of a unique file name for each instance of the application in a cluster.
You can only use the $X suffix for the outputPureQueryXml
file because it allows creation of multiple files.
In a single-node environment, you only need to specify the
pureQueryXml property for initial capture because there will not be any resource contention.
However, in a clustered environment, to avoid resource contention during the initial
capture, its recommended that you use the $X syntax.
You can use the $X suffix in any level of the
pdq.properties file (global, data source, or application).
The intended use for the suffix is for clustered environments when the same application is running on multiple instances of the
server.
If you use the suffix in a single-server environment,
the result is one file with the timestamp and classloader ID suffix.
Note: Clusters are seen as a single entity and provide a single view of the application. Therefore, it is not possible to have different settings for different members of the clusters.
Merging incremental capture files
When you use the $X suffix
in a cluster environment, a separate file is created for each instance/cluster member.
Each file contains the SQL statements captured by each cluster member.
Some of the SQL statements may be common across multiple files,
and some files may contain unique SQL.
This depends on the application flow in each instance.
Therefore, after the capture is done,
you may want to combine all the captured SQL
into a single file that you can then configure and bind to the respective data sources for static
execution.
To do this, you can use the Merge command line utility.
This utility lets you merge two or more capture files into
one output pdqxml file.
(Alternatively, you can also use Optim Development Studio to merge files.)
Listing 1 shows an example of how to use the Merge command.
Listing 1.
Merge command example
java com.ibm.pdq.tools.Merge -inputPureQueryXml “c:\Program Files\capture1.pdqxml” capture2.pdqxml capture3.pdqxml -outputPureQueryXml merge.pdqxml |
Use the inputPureQueryXml parameter
to specify the input capture files to be merged.
Enter file names as a space separated list .
You can specify the file names in either absolute or relative format.
Because the space character is used as the list delimiter, if a file name itself contains a space,
you need to enclose that file name in quotes.
The input files are treated as read-only files.
The example in Listing 1 specifies three input files.
Use the outputPureQueryXml parameter
to specify the file name of the merged capture file.
The example in Listing 1 specifies merge.pdqxml as the name
of the merged capture file.
The Resources section of this article
contains a link to the product documentation for the Merge
command.
Figure 3 illustrates the process for merging multiple capture files.
Figure 3. Process for merging multiple capture files
Incrementally capturing to an existing merged capture file
After you have merged multiple input capture files into a single output file, and configured and bound that output file, you may still need to incrementally capture additional SQL. This could be because of changes to the application, or for other reasons. If you were to use the merged capture file as the pureQueryXml for incremental capture, you again risk the possibility of encountering resource contention issues.
To overcome this problem, you can use both pureQueryXml
and outputPureQueryXml files.
When both files are specified and have different names, this is known
as incremental capture.
The pureQueryXml is treated as input,
and only the new SQL statements that are not already included in the pureQueryXml
are captured in the outputPureQueryXml file.
Again, you can use
outputPureQueryXml with $X file suffix syntax to create multiple files,
one for each instance, and merge them using the Merge utility.
Listing 2 shows an example of the property settings for using incremental capture, where App1_base.pdqxml is the name of the merged file.
Listing 2. Example property settings for incremental capture
pdq.pureQueryXml=App1_base.pdqxml pdq.outputPureQueryXml=App1$X.pdqxml pdq.captureMode=ON |
To process only new SQL and not lose existing package names and structures from the
originally captured SQL, you can
identify the existing configured pdqxml file as
a seed file.
The newly captured SQL is then added into the existing pdqxml file.
This scenario is illustrated in Figure 4.
The option to specify a seed file is -baseFile.
To configure and bind the new SQL without losing existing package names and structures,
use the options -cleanConfigure FALSE and bind -differenceOnly TRUE.
Figure 4. Merging multiple capture files with -baseFile option
Note:
An exception to the above behavior is a scenario where a
SELECT statement is captured in the initial capture
and the corresponding position update is captured in the incremental capture mode.
In such a case, pureQuery client optimization needs to move the position update statement to the existing package
(the package where the SELECT statement resides)
to run the position update statement successfully in static mode.
If you move the position update statement to a different package it is necessary to
perform a configure.
If you must retain the previously bound
packages, you may delete the position update
statement from capture file — at
runtime pureQuery client optimization
performs a dynamic update on a static cursor.
If it is not necessary to retain the package, then
you may perform a clean configure after the output file is merged with the input file.
Performing a clean configure ensures that the
SELECT statement and the corresponding
position update statements are in the same package.
After the configure, you can perform a bind to replace the existing package.
The Use incremental capture with the merged capture file section of this article. contains more details about this type of incremental capture scenario
The following scenario is intended to help further describe how to configure and work with client optimization in clustered application server environments. The scenario starts with an example similar to the one described in the first article. However, in this case it is extended to support a cluster environment instead of an environment with a single application server node.
As shown in Figure 5, Application 1 and Application 2 are installed on a cluster that has four members. The four members consist of two nodes with two cluster members per node. DS1, DS2 and DS3 are the data sources set up on the cluster. Each node has its own common file system. Application 1 use all three data sources (DS1, DS2, and DS3), while Application 2 uses only one data source (DS3).
Figure 5.Clustering scenario
Capture SQL in the clustered environment
For this scenario, the application level properties file named pdq.appwide.properties is set to capture the SQL statements for both applications.
Table 1 shows the configuration of data sources for Application 1.
Table 1. Configured data sources for Application 1
| Database name | Configured dataSourceName | Property file name |
|---|---|---|
| DB1 (DB2) | DS1 | pdq.ds1.properties |
| DB2 (DB2) | DS2 | pdq.ds2.properties |
| DB3(non-DB2) | DS3 | pdq.ds3.properties |
Table 2 shows the settings for the pureQuery properties files for Application 1. These settings enable capturing for the application and outputting capture files that are specific to whatever SQL was run against the target data source.
Table 2. pureQuery properties files for Application 1
| File name | Property definition | Affects which data source? |
|---|---|---|
| pdq.appwide.properties | pdq.captureMode=ON pdq.executionMode=DYNAMIC | DS1, DS2, and DS3 |
| pdq.ds1.properties | pdq.outputPureQueryXml= app1_ds1$X.pdqxml | DS1 |
| pdq.ds2.properties | pdq.outputPureQueryXml= app1_ds2$X.pdqxml | DS2 |
| pdq.ds3.properties | pdq.outputPureQueryXml= app1_ds3$X.pdqxml | DS3 |
Table 3 shows the setting for the pureQuery properties file for Application 2, which only accesses data source 3 (DS3).
Table 3. pureQuery properties file for Application 2
| File name | Property definition | Affects which data source? |
|---|---|---|
| pdq.appwide.properties | pdq.captureMode=ON pdq.executionMode=DYNAMIC | DS3 |
| pdq.ds3.properties | pdq.outputPureQueryXml= app2_ds3$X.pdqxml | DS3 |
When the applications are executed in capture mode, $X is
replaced with the timestamp and the object ID of the application classloader.
Each of the clusters creates its own file for capture for each data source.
Table 4 shows the various files that are created when capturing SQL for Application 1.
Tn represents timestamps.
On represents the classloader IDs of the application.
Table 4. Capture files for Application 1
| Data source | Hclus1_1 | Hclus1_2 | Hclus2_1 | Hclus2_2 |
|---|---|---|---|---|
| DS1 | app1_ds1T1O1.pdqxml | app1_ds1T2O2.pdqxml | app1_ds1T3O3.pdqxml | app1_ds1T4O4.pdqxml |
| DS2 | app1_ds2T5O5.pdqxml | app1_ds2T6O6.pdqxml | app1_ds2T7O7.pdqxml | app1_ds2T8O8.pdqxml |
| DS3 | app1_ds3T909.pdqxml | app1_ds3T10O10.pdqxml | app1_ds3T11O11.pdqxml | app1_ds3T12O12.pdqxml |
Table 5 shows the various files created when capturing SQL for Application 2.
Table 5. Capture files for Application 2
| Data source | Hclus1_1 | Hclus1_2 | Hclus2_1 | Hclus2_2 |
|---|---|---|---|---|
| DS3 | app2_ds3T13O13.pdqxml | app2_ds3T14O14.pdqxml | app2_ds3T15O15.pdqxml | app2_ds3T16O16.pdqxml |
Note: You can use the $X
suffix for an application level property file,
but be aware that the application level property file impacts all the data sources.
In this scenario, Application 2 is using only one data source.
Therefore, use of the suffix in the application level property file affects only the one data
source.
Once the capture session is over, you can merge the various files before configuring and binding the files. For this sample scenario, you want to bind the data source specific files to their corresponding database. For example, for Application 1 with DS1, you want to merge the four capture files that correspond to DS1 (refer to Table 4), before configuring and binding the merged file to the corresponding database (DB1).
Listing 3 shows an example of the Merge command you would use to merge the files specific to DS1.
Listing 3. Example
Merge command
java com.ibm.pdq.tools.Merge –inputPureQueryXml app1_ds1T1O1.xml app1_ds1T2O2.xml app1_ds1T3O3.xml app1_ds1T4O4.xml -outputPureQueryXml app1_ds1.pdqxml |
In the example above, app1_ds1.pdqxml is the output file produced by the merge.
Once you have the output file you are ready to configure it and bind it.
You can similarly merge the capture files specific to DS2, and then configure and bind the output file to the database named DB2.
Use incremental capture with the merged capture file
This part of the scenario corresponds to the situation illustrated in Figure 4.
Assume you have discovered that you need to
run incremental capture on Application 1.
This could be because you missed some SQL in the previous capture,
or maybe because some additional logic has been added to the application.
To do this, you would use the merged file (app1_ds1.pdqxml)
as the input pureQueryXml file and specify a
separate file with the $X suffix for the outputPureQueryXml filename.
Listing 4 shows an example of how you would set the properties for pdq.ds1.properties to do the incremental capture on DS1.
Listing 4. Newly captured SQL goes to the file specified as outputPureQueryXML
pdq.pureQueryXml=app1_ds1.pdqxml pdq.outputPureQueryXml=app1_ds1$X.pdqxml |
Note: When you set the outputPureQueryXml, the pureQueryXml file is automatically considered as input and read-only. This example does not have an explicit inputPureQueryXml file. If you do not specify outputPureQueryXml, then pureQueryXml is considered to be both the input and output.
Using the settings shown in Listing 4, when you run the application in capture mode for DS1, you create four files,
one for each cluster member.
You can then use the Merge command line utility to merge these
files into a new pureQueryXML file.
For this scenario, you also want to merge the new SQL with the existing configured pdqxml file without overwriting the settings already configured for pdqxml. To do that, you need to specify the existing configured pdqxml file as the base file, app1_ds1.pdqxml.
Listing 5 shows an example of the Merge command you could use to do this.
Listing 5. Merging files on DS1
java com.ibm.pdq.tools.Merge –inputPureQueryXml app1_ds1T1O1.pdqxml app1_ds1T2O2.pdqxml app1_ds1T3O3.pdqxml app1_ds1T4O4.pdqxml -baseFile app1_ds1.pdqxml -outputPureQueryXml app1_ds1_V2.pdqxml |
The options on the above sample Merge command
specify that you want to put the new SQL from all three input files into an unnamed
statement set after the current contents of the app1_ds1.pdqxml file and
write the new SQL to the app1_ds1_V2.pdqxml output file.
You can then incrementally configure the output file
using the -cleanConfigure FALSE option and then bind it using the -differenceOnly TRUE option.
WebSphere Application Server clustered environments are used for high availability and performance.
Because pureQuery provides performance optimization for WAS,
it is important to understand how pureQuery client optimization can operate in such an environment.
This article reviewed how to use properties and the $X capture file suffix
to enable capturing of SQL across members of a cluster.
These techniques serve to avoid resource contention that might occur if a single capture file was used.
The article also provided an overview and examples of how to use the Merge command
line utility to merge captured SQL files from each member into a single file that you
can then configure and bind
into a database package (or packages) on the target database servers.
We gratefully acknowledge Christopher M. Farrar, Kathryn Zeidenstein, Patrick Titzler, and Kavitha Pullela for their support in reviewing the content of this article.
Learn
-
Read
"No Excuses" Database Programming for Java
to learn more about static SQL and its benefits.
-
Refer to the
Device Manager Information Center for an overview of clustering.
-
For information on capturing SQL in clustered environments, refer to
this topic in the Integrated
Data Management Information Center.
-
For details on the Merge utility, refer to
this topic in the Integrated Data Management Information Center.
- The
"Integrated Data Management: Managing data across its lifecycle" (developerWorks, updated June 2009) article explains both the vision and reality of Integrated Data Management across roles.
- The
Optim
Performance Management Solution demo shows you how one fictional company uses Optim solutions for performance management.
The demo includes a client optimization scenario.
-
Refer to the developerWorks Optim family page to learn more
about Optim solutions.
Find technical documentation, how-to articles, education, downloads, product information, and more.
Get products and technologies
-
Data Studio and Optim
trial and no-charge software, includes a link to Optim Development Studio, which provides you with a 30-day trial for both the development environment and pureQuery Runtime for development use on a single machine.
-
Prerequisites for pureQuery Runtime for Linux, UNIX, and Windows 2.2.0.1 are in
this technote.
-
Prerequisites for pureQuery Runtime for z/OS 2.2 are in
this technote.
Discuss
- Participate in the discussion forum.
- Check out the
Integrated Data Management experts blog and
get involved in the
Integrated Data Management community space,
which has a comprehensive list of resources and downloads.
Charul Kapil is a software engineer in the IBM India Software Labs, where she has worked for three years. Currently she is working in the OPM-Reporting QA team. Previously, she was a member of the pureQuery Client Optimization QA team.
Jaijeet Chakravorty is a software engineer with IBM. He is located in the IBM Silicon Valley Lab, San Jose, California. He works with the Java Common Client JDBC driver, SQLJ, and pureQuery client-side products. Jaijeet is involved in product development and providing L3 support to customers worldwide.
Manoj Sardana is a staff software engineer who has worked in IBM India software labs for more than five years. He is an IBM-certified advance administrator and application developer for DB2 Linux, UNIX, and Windows. Currently he works on the pureQuery development team. Manoj holds a computer engineering degree from NITK Surathkal. In his free time, he enjoys listening to music and playing with children.




