Managing pureQuery client optimization in Web application environments, Part 2: Optimizing applications in clustered environments

Tips for successful deployment

pureQuery client optimization can improve the performance, security, and administration of Java database applications. The first article in this two-part series described how how to enable client optimization on a single application server node. This second article uses scenarios to describe how to configure and work with client optimization in clustered application server environments, specifically, clustered WebSphere Application Server environments.

Charul Kapil (chakapil@in.ibm.com), Software Engineer, WSO2 Inc

Charul Kapil is a software engineer in the IBM India Software Labs, where she has worked for three years. Currently she is working in the OPM-Reporting QA team. Previously, she was a member of the pureQuery Client Optimization QA team.



Jaijeet Chakravorty (jaijeet@us.ibm.com), Software Engineer, WSO2 Inc

Jaijeet Chakravorty is a software engineer with IBM. He is located in the IBM Silicon Valley Lab, San Jose, California. He works with the Java Common Client JDBC driver, SQLJ, and pureQuery client-side products. Jaijeet is involved in product development and providing L3 support to customers worldwide.



Manoj Sardana (msardana@in.ibm.com), Software Engineer, EMC

Manoj Sardana is a staff software engineer who has worked in IBM India software labs for more than five years. He is an IBM-certified advance administrator and application developer for DB2 Linux, UNIX, and Windows. Currently he works on the pureQuery development team. Manoj holds a computer engineering degree from NITK Surathkal. In his free time, he enjoys listening to music and playing with children.


developerWorks Contributing author
        level

27 May 2010

Also available in Russian

Introduction

The first article of this series described how client optimization can improve database application performance and security, and how you can enable client optimization on a single application server node. This second article briefly introduces the concept of clustering, explains how to define a data source and a pureQuery-enabled JDBC provider in a clustered environment, and then describes how to configure and work with client optimization in clustered application server environments, specifically, clustered WebSphere Application Server (WAS) environments.

Review of client optimization

As you may recall, the first article described how client optimization consists of the following main phases:

  • Capture the SQL from the running application.
  • Configure the captured SQL to specify target package names.
  • Bind the configured capture file into DB2 packages, if the target database is DB2.
  • Execute the application with the captured SQL to get desired behavior, such as static execution.

This second article focuses on extensions to the capture phase that enable you to accommodate a clustered environment. The article describes how to configure the JDBC provider, data sources, and the pureQuery properties to enable capturing SQL when multiple WebSphere instances are running the same application in a cluster. This basically means that you create several output capture files. The other part of the process that the article describes is how you can merge the captured SQL into a single file that you can then configure and bind to the target data source.

Review of clustering

Clustering, as the name suggests, refers to a logical collection of application server instances running the same application. The purpose of clustering is to provide workload management and high availability. If one instance of the server goes down, another instance can take over the workload. The individual server instances in a cluster environment are referred to as cluster members.

To use clustering with WebSphere Application Server, you need to have the WebSphere Application Server Network Deployment edition. This edition of WebSphere includes a profile called a deployment manager, which is a special WebSphere instance. The deployment manager enables you to create and manage clusters.

Members of a cluster can be from the same or different nodes and can be on the same or different machines. A node is a logical grouping of application server instances that share the same configuration control and are generally associated with one physical installation of WAS. Each node uses an agent to communicate with the deployment manager. The deployment manager profile allows you to manage the configuration of multiple application servers across multiple nodes.

There are three types of WAS clustering environments. The different types vary in regard to how and where the cluster members are located:

  • Horizontal clustering— Various instances of the server or cluster members are on different nodes.
  • Vertical clustering— All cluster members are on the same node.
  • Mixed clustering— This is a mix of both horizontal and vertical clustering.

Figure 1 shows an example of a mixed clustering environment. On Node 1, the cluster members named clus1_1 and clus1_2 are an example of horizontal clustering on one node. Similarly, Node 2 has two cluster members named clus2_1 and clus2_2 that are horizontally clustered. Because Node_1 and Node_2 are also vertically clustered, the environment shown represents mixed clustering.

Figure 1. Example of a mixed clustering environment
Within WAS, the application goes through the mixed clustering environment to the database server.

To learn more about clustering, refer to the Device Manager Information Center link in the Resources section of this article.

When you install an application in a mixed cluster environment, the application is installed on all the cluster members at the same time. You can stop, start, and administer a cluster as a single entity, in the same way you would do so with a single server.

This article assumes that all nodes in the cluster environment are managed by the deployment manager. This means that any property settings made at the node level are propagated to all of the environment's server members. The article also assumes that the Web server is part of the application server.

Creating a pureQuery-enabled JDBC provider and data source on the cluster

When you create a JDBC provider in a clustered environment, you need to make sure that the scope of the JDBC provider is cluster and not the individual instance of the server profile. A scope for a resource (such as the JDBC provider or data source) defines its visibility in the application server environment. For example, a scope defined at the cluster level means all the cluster members can see the resource. Similarly, a scope defined at the node level can be seen by all the server instances in that node. You can define the scope when you create the JDBC provider and data sources.

You create the data source in a clustered environment in the same manner as you do in a single-server environment, except that you define the scope for the data source as cluster. Again, this is so that the data source is visible to all the cluster members. Figure 2 shows how to use the Data sources page of the Integrated Solutions Console to select the scope of a new data source. After you choose to show the scope selection drop-down list, the list shows all the available nodes, clusters, and the application server instances.

Figure 2. Define the data source scope as cluster
Screen shot of Data sources page of Integrated Solutions Console with Cluster selected for the data source scope.

Refer to the first article in this series for instructions on creating the data source.

Once you have created the data source, you can set the data-source-level pdq properties in the same way you do for a single server. The properties are then applicable to the cluster. Everything described in the first article in terms of the properties for a single-server environment applies as well in a clustered environment.


Configuring client optimization in a clustered environment

As described previously, when you install an application on a cluster, it is installed on all the cluster members, and all these cluster members have the same view of the back-end database (data source). When you enable client optimization in such an environment, it increases the possibility of resource contention. This is because of the increased likelihood that multiple instances of the same application from different cluster members will be running in capture mode and may try to write to the same capture file.

To avoid the resource contention problem, a new file name suffix is used. The suffix is $X. During the SQL capture phase, you can provide a capture file name (using the outputPureQueryXml property) that includes the $X suffix. For example: pdq.outputPureQueryXml=App1$X.pdqxml

During the capture process of client optimization, the $X is replaced with the timestamp and the object ID of the application classloader when the file is first created. This ensures creation of a unique file name for each instance of the application in a cluster.

You can only use the $X suffix for the outputPureQueryXml file because it allows creation of multiple files. In a single-node environment, you only need to specify the pureQueryXml property for initial capture because there will not be any resource contention. However, in a clustered environment, to avoid resource contention during the initial capture, its recommended that you use the $X syntax.

You can use the $X suffix in any level of the pdq.properties file (global, data source, or application). The intended use for the suffix is for clustered environments when the same application is running on multiple instances of the server. If you use the suffix in a single-server environment, the result is one file with the timestamp and classloader ID suffix.

Note: Clusters are seen as a single entity and provide a single view of the application. Therefore, it is not possible to have different settings for different members of the clusters.


Merging incremental capture files

When you use the $X suffix in a cluster environment, a separate file is created for each instance/cluster member. Each file contains the SQL statements captured by each cluster member. Some of the SQL statements may be common across multiple files, and some files may contain unique SQL. This depends on the application flow in each instance. Therefore, after the capture is done, you may want to combine all the captured SQL into a single file that you can then configure and bind to the respective data sources for static execution.

To do this, you can use the Merge command line utility. This utility lets you merge two or more capture files into one output pdqxml file. (Alternatively, you can also use Optim Development Studio to merge files.) Listing 1 shows an example of how to use the Merge command.

Listing 1. Merge command example
java com.ibm.pdq.tools.Merge 
-inputPureQueryXml “c:\Program Files\capture1.pdqxml” capture2.pdqxml capture3.pdqxml 
-outputPureQueryXml merge.pdqxml

Use the inputPureQueryXml parameter to specify the input capture files to be merged. Enter file names as a space separated list . You can specify the file names in either absolute or relative format. Because the space character is used as the list delimiter, if a file name itself contains a space, you need to enclose that file name in quotes. The input files are treated as read-only files. The example in Listing 1 specifies three input files.

Use the outputPureQueryXml parameter to specify the file name of the merged capture file. The example in Listing 1 specifies merge.pdqxml as the name of the merged capture file.

The Resources section of this article contains a link to the product documentation for the Merge command.

Figure 3 illustrates the process for merging multiple capture files.

Figure 3. Process for merging multiple capture files
Flow chart depciting three input capture files going through the Merge utility and creating a single merged output file.

Incrementally capturing to an existing merged capture file

After you have merged multiple input capture files into a single output file, and configured and bound that output file, you may still need to incrementally capture additional SQL. This could be because of changes to the application, or for other reasons. If you were to use the merged capture file as the pureQueryXml for incremental capture, you again risk the possibility of encountering resource contention issues.

To overcome this problem, you can use both pureQueryXml and outputPureQueryXml files. When both files are specified and have different names, this is known as incremental capture. The pureQueryXml is treated as input, and only the new SQL statements that are not already included in the pureQueryXml are captured in the outputPureQueryXml file. Again, you can use outputPureQueryXml with $X file suffix syntax to create multiple files, one for each instance, and merge them using the Merge utility.

Listing 2 shows an example of the property settings for using incremental capture, where App1_base.pdqxml is the name of the merged file.

Listing 2. Example property settings for incremental capture
pdq.pureQueryXml=App1_base.pdqxml
pdq.outputPureQueryXml=App1$X.pdqxml
pdq.captureMode=ON

To process only new SQL and not lose existing package names and structures from the originally captured SQL, you can identify the existing configured pdqxml file as a seed file. The newly captured SQL is then added into the existing pdqxml file. This scenario is illustrated in Figure 4. The option to specify a seed file is -baseFile. To configure and bind the new SQL without losing existing package names and structures, use the options -cleanConfigure FALSE and bind -differenceOnly TRUE.

Figure 4. Merging multiple capture files with -baseFile option
Three input capture files going into the Merge utility. The merged output file uses the merged input plus the base seed file.

Note: An exception to the above behavior is a scenario where a SELECT statement is captured in the initial capture and the corresponding position update is captured in the incremental capture mode. In such a case, pureQuery client optimization needs to move the position update statement to the existing package (the package where the SELECT statement resides) to run the position update statement successfully in static mode. If you move the position update statement to a different package it is necessary to perform a configure. If you must retain the previously bound packages, you may delete the position update statement from capture file — at runtime pureQuery client optimization performs a dynamic update on a static cursor. If it is not necessary to retain the package, then you may perform a clean configure after the output file is merged with the input file. Performing a clean configure ensures that the SELECT statement and the corresponding position update statements are in the same package. After the configure, you can perform a bind to replace the existing package.

The Use incremental capture with the merged capture file section of this article. contains more details about this type of incremental capture scenario


A scenario

The following scenario is intended to help further describe how to configure and work with client optimization in clustered application server environments. The scenario starts with an example similar to the one described in the first article. However, in this case it is extended to support a cluster environment instead of an environment with a single application server node.

As shown in Figure 5, Application 1 and Application 2 are installed on a cluster that has four members. The four members consist of two nodes with two cluster members per node. DS1, DS2 and DS3 are the data sources set up on the cluster. Each node has its own common file system. Application 1 use all three data sources (DS1, DS2, and DS3), while Application 2 uses only one data source (DS3).

Figure 5.Clustering scenario
Client goes to WAS mixed cluster which goes to sources. Each node has a separate file system.

Capture SQL in the clustered environment

For this scenario, the application level properties file named pdq.appwide.properties is set to capture the SQL statements for both applications.

Table 1 shows the configuration of data sources for Application 1.

Table 1. Configured data sources for Application 1
Database nameConfigured dataSourceNameProperty file name
DB1 (DB2)DS1pdq.ds1.properties
DB2 (DB2)DS2pdq.ds2.properties
DB3(non-DB2) DS3pdq.ds3.properties

Table 2 shows the settings for the pureQuery properties files for Application 1. These settings enable capturing for the application and outputting capture files that are specific to whatever SQL was run against the target data source.

Table 2. pureQuery properties files for Application 1
File nameProperty definitionAffects which data source?
pdq.appwide.propertiespdq.captureMode=ON pdq.executionMode=DYNAMIC DS1, DS2, and DS3
pdq.ds1.propertiespdq.outputPureQueryXml= app1_ds1$X.pdqxmlDS1
pdq.ds2.properties pdq.outputPureQueryXml= app1_ds2$X.pdqxmlDS2
pdq.ds3.properties pdq.outputPureQueryXml= app1_ds3$X.pdqxmlDS3

Table 3 shows the setting for the pureQuery properties file for Application 2, which only accesses data source 3 (DS3).

Table 3. pureQuery properties file for Application 2
File nameProperty definitionAffects which data source?
pdq.appwide.propertiespdq.captureMode=ON pdq.executionMode=DYNAMIC DS3
pdq.ds3.properties pdq.outputPureQueryXml= app2_ds3$X.pdqxmlDS3

When the applications are executed in capture mode, $X is replaced with the timestamp and the object ID of the application classloader. Each of the clusters creates its own file for capture for each data source. Table 4 shows the various files that are created when capturing SQL for Application 1. Tn represents timestamps. On represents the classloader IDs of the application.

Table 4. Capture files for Application 1
Data sourceHclus1_1Hclus1_2Hclus2_1Hclus2_2
DS1app1_ds1T1O1.pdqxmlapp1_ds1T2O2.pdqxmlapp1_ds1T3O3.pdqxmlapp1_ds1T4O4.pdqxml
DS2 app1_ds2T5O5.pdqxmlapp1_ds2T6O6.pdqxmlapp1_ds2T7O7.pdqxmlapp1_ds2T8O8.pdqxml
DS3 app1_ds3T909.pdqxmlapp1_ds3T10O10.pdqxmlapp1_ds3T11O11.pdqxmlapp1_ds3T12O12.pdqxml

Table 5 shows the various files created when capturing SQL for Application 2.

Table 5. Capture files for Application 2
Data sourceHclus1_1Hclus1_2Hclus2_1Hclus2_2
DS3app2_ds3T13O13.pdqxmlapp2_ds3T14O14.pdqxmlapp2_ds3T15O15.pdqxmlapp2_ds3T16O16.pdqxml

Note: You can use the $X suffix for an application level property file, but be aware that the application level property file impacts all the data sources. In this scenario, Application 2 is using only one data source. Therefore, use of the suffix in the application level property file affects only the one data source.

Merge the captured files

Once the capture session is over, you can merge the various files before configuring and binding the files. For this sample scenario, you want to bind the data source specific files to their corresponding database. For example, for Application 1 with DS1, you want to merge the four capture files that correspond to DS1 (refer to Table 4), before configuring and binding the merged file to the corresponding database (DB1).

Listing 3 shows an example of the Merge command you would use to merge the files specific to DS1.

Listing 3. Example Merge command
java com.ibm.pdq.tools.Merge 
–inputPureQueryXml app1_ds1T1O1.xml app1_ds1T2O2.xml app1_ds1T3O3.xml app1_ds1T4O4.xml 
-outputPureQueryXml app1_ds1.pdqxml

In the example above, app1_ds1.pdqxml is the output file produced by the merge. Once you have the output file you are ready to configure it and bind it.

You can similarly merge the capture files specific to DS2, and then configure and bind the output file to the database named DB2.

Use incremental capture with the merged capture file

This part of the scenario corresponds to the situation illustrated in Figure 4. Assume you have discovered that you need to run incremental capture on Application 1. This could be because you missed some SQL in the previous capture, or maybe because some additional logic has been added to the application. To do this, you would use the merged file (app1_ds1.pdqxml) as the input pureQueryXml file and specify a separate file with the $X suffix for the outputPureQueryXml filename.

Listing 4 shows an example of how you would set the properties for pdq.ds1.properties to do the incremental capture on DS1.

Listing 4. Newly captured SQL goes to the file specified as outputPureQueryXML
pdq.pureQueryXml=app1_ds1.pdqxml
pdq.outputPureQueryXml=app1_ds1$X.pdqxml

Note: When you set the outputPureQueryXml, the pureQueryXml file is automatically considered as input and read-only. This example does not have an explicit inputPureQueryXml file. If you do not specify outputPureQueryXml, then pureQueryXml is considered to be both the input and output.

Using the settings shown in Listing 4, when you run the application in capture mode for DS1, you create four files, one for each cluster member. You can then use the Merge command line utility to merge these files into a new pureQueryXML file.

For this scenario, you also want to merge the new SQL with the existing configured pdqxml file without overwriting the settings already configured for pdqxml. To do that, you need to specify the existing configured pdqxml file as the base file, app1_ds1.pdqxml.

Listing 5 shows an example of the Merge command you could use to do this.

Listing 5. Merging files on DS1
java com.ibm.pdq.tools.Merge 
–inputPureQueryXml app1_ds1T1O1.pdqxml app1_ds1T2O2.pdqxml 
app1_ds1T3O3.pdqxml app1_ds1T4O4.pdqxml 
-baseFile app1_ds1.pdqxml
-outputPureQueryXml app1_ds1_V2.pdqxml

The options on the above sample Merge command specify that you want to put the new SQL from all three input files into an unnamed statement set after the current contents of the app1_ds1.pdqxml file and write the new SQL to the app1_ds1_V2.pdqxml output file. You can then incrementally configure the output file using the -cleanConfigure FALSE option and then bind it using the -differenceOnly TRUE option.


Summary

WebSphere Application Server clustered environments are used for high availability and performance. Because pureQuery provides performance optimization for WAS, it is important to understand how pureQuery client optimization can operate in such an environment. This article reviewed how to use properties and the $X capture file suffix to enable capturing of SQL across members of a cluster. These techniques serve to avoid resource contention that might occur if a single capture file was used.

The article also provided an overview and examples of how to use the Merge command line utility to merge captured SQL files from each member into a single file that you can then configure and bind into a database package (or packages) on the target database servers.


Acknowledgements

We gratefully acknowledge Christopher M. Farrar, Kathryn Zeidenstein, Patrick Titzler, and Kavitha Pullela for their support in reviewing the content of this article.

Resources

Learn

Get products and technologies

  • Data Studio and Optim trial and no-charge software, includes a link to Optim Development Studio, which provides you with a 30-day trial for both the development environment and pureQuery Runtime for development use on a single machine.
  • Prerequisites for pureQuery Runtime for Linux, UNIX, and Windows 2.2.0.1 are in this technote.
  • Prerequisites for pureQuery Runtime for z/OS 2.2 are in this technote.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Java technology, Web development
ArticleID=492241
ArticleTitle=Managing pureQuery client optimization in Web application environments, Part 2: Optimizing applications in clustered environments
publish-date=05272010