Skip to main content

IBM Data Studio pureQuery Runtime for z/OS Performance

Simon Harris (simond_harris@uk.ibm.com), Senior Software Engineer, IBM
Simon Harris
Simon Harris is a Performance Engineer within the WebSphere Federation Server development team in the Silicon Valley Laboratory. Simon has been working with federated database technology since its inception in IBM in 1995, supporting many customers in both a pre- and post-sales capacity throughout Europe, the Middle-East, and Africa.
Rajesh Sambandhan, Advisory software engineer , IBM
Rajesh Sambandhan is an advisory software engineer on the IBM Information Management team, India Software Labs. He has been in the industry for more than seven years, working primarily in J2EE and Test Driven Development.

Summary:  pureQuery improves application throughput on DB2 for z/OS by making it easy to code and deploy static SQL for Java. The new client optimization feature in Version 1.2 extends the benefits of static SQL to any existing Java application without changing or recompiling Java applications.

Date:  01 Jul 2008
Level:  Intermediate
Activity:  1013 views

Note:The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While IBM may have reviewed each item for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Anyone attempting to adapt these techniques to their own environments does so at their own risk.

Any performance data contained in this document were determined in various controlled laboratory environments and are for reference purposes only. Customers should not adapt these performance numbers to their own environments as system performance standards. The results that may be obtained in other operating environments may vary significantly. Users of this document should verify the applicable data for their specific environment.

IBM recently announced Data Studio pureQuery Runtime for z/OS, which enables organizations to take advantage of pureQuery technology natively on their z/OS systems. Previously, pureQuery could access DB2 for z/OS data only from a non-z/OS system, such as Windows or AIX.

In this article, we’ll provide a brief overview of pureQuery technology, the enhancements announced with Version 1.2, and the results of a performance study comparing pureQuery access methods with a variety of popular Java data access alternatives (including JDBC). The results of the study show the potential for significant throughput improvements and CPU reduction using pureQuery static execution as compared to the alternatives. And, with the new client optimization feature, these benefits can be realized without making changes to the application.

Introducing pureQuery

pureQuery is a high-performance Java data access technology that simplifies the tasks of developing and managing applications that access data. It consists of tools (IBM Data Studio Developer), APIs, and a runtime (IBM Data Studio pureQuery Runtime).

For a good summary of pureQuery, read “The Easy Way to Quick Data Access.” As that article explains, there is a spectrum of choices regarding Java frameworks or persistence mechanisms. Figure 1 shows that pureQuery gives database developers and DBAs greater control over the SQL that is generated and executed. pureQuery provides a good balance of productivity enhancements with the control, performance and security benefits of static SQL.


Figure 1. Where pureQuery fits in the spectrum of Java options.
Figure 1. Where pureQuery fits in the spectrum of Java options.

pureQuery supports two styles of coding: annotated method and inline style. Both styles are designed to be easy for developers to learn and to reduce the amount of code generated. pureQuery tools also support both styles.

With the inline style, the SQL is included directly in the Java code and uses the out-of-the-box data interface. The inline style supports only dynamic execution at this time. With the annotated method style, the SQL is maintained either within the application, or in separate interface files that you define. The interface files make it easier for the actual Java application to remain independent of the persistence approach. The annotated method style supports both dynamic and static execution.

This article focuses on the benefits of using static execution; therefore, our performance study uses the annotated method style.


What’s new in Data Studio Developer and pureQuery V1.2

Client optimization, a key enhancement in the 1.2 release, makes it possible to take advantage of static SQL for existing JDBC applications, without modifying — or even having access to — the source code. This capability also works on Java applications that use a framework for persistence, such as Hibernate. With client optimization, you execute the application while the pureQuery runtime is in “capture” mode, typically as part of a use case test validation that exercises various program execution paths. The DBA can then bind the captured SQL into packages at the database server. Subsequent executions of the application allow the pureQuery runtime to execute the captured SQL statically. In this way, client optimization gives applications the performance and security benefits of static SQL, without the need to modify or recompile the application.

For Java applications, including those that use the pureQuery API or those that have been “client optimized,” Data Studio Developer enables developers or DBAs to quickly locate specific SQL statements in the Java source, down to the specific line number (see Figure 2). This ability to visualize relationships among Java source and SQL is known as the pureQuery outline view. The outline view also provides a database schema perspective, which can be useful when doing impact analysis or problem isolation. For example, if you’re considering dropping a column, you can find out where in the Java program that column is being used and modify the SQL appropriately.


Figure 2. Data Studio Developer pureQuery outline view.
 Figure 2. Data Studio Developer pureQuery outline view.

IBM driver for JDBC and SQLJ (Type 2 and Type 4)

Both JDBC Type 2 and Type 4 drivers are commonly used by applications to access DB2 data. The Type 2 driver requires the DB2 client to be installed on the same machine as the application; whereas the Type 4 driver is 100 percent Java and doesn’t require a DB2 client to be installed (it directly implements the network protocol for the data source). This article presents the performance numbers using both Type 2 and Type 4 connections.

Figure 3 shows the new capability of running the pureQuery runtime natively on z/OS using either Type 2 or Type 4 connections as available with Version 1.2.


Figure 3. Running Data Studio pureQuery Runtime natively on z/OS (new in 1.2)
Figure 3. Running Data Studio pureQuery Runtime natively on z/OS (new in 1.2)

Performance study workload

The performance study used the IBM Relational Warehouse Workload (IRWW) running in WebSphere Application Server V6.1. The workload was executed on DB2 for z/OS V9 running on a three-way Z9 processor.

IRWW is an OLTP application based on an order entry and tracking system. The IRWW models a wholesale supplier managing orders and consists of seven transaction types. The frequency of transactions is set to simulate a realistic scenario; the mix used in the benchmark environment was 47 percent update transactions, 53 percent read-only transactions. The workload is triggered by a Java client program which generates HTTP requests for the required transaction mix. The performance is reported in Normalized Throughput Rate - Transactions Per Second (ITR).

The Internal Throughput Rate (ITR) is the notional throughput rate assuming that the CPUs are 100 percent busy. For example, consider an application with a transaction rate of 200 transactions per second at 75 percent CPU consumption. The ITR for this application would be 200 * 100/75 = 267 tps. This is the notional transaction rate that could be achieved if the CPUs were 100 percent busy.


Performance study scope

A prerelease version of the pureQuery driver was used for the study. The scope of the performance study was to compare normalized throughput and CPU consumption, using both the pureQuery annotated method style (dynamic and static execution) and the new client optimization feature, with the technology alternatives. The IRWW was modified to incorporate each of these technologies, and the performance numbers were extracted by running the workload and monitoring the system. In this article, the scope is limited to the following technologies:

  • pureQuery annotated method style (dynamic execution)
  • pureQuery annotated method style (static execution)
  • JDBC with pureQuery client optimization (static execution)
  • JDBC (dynamic execution)
  • JDBC (dynamic execution)
  • EJB2 (dynamic execution)
  • OpenJPA (dynamic execution).

Note that the IRWW and this test don’t make use of all pureQuery features that can help improve performance. A key new feature in Version 1.2 is heterogeneous batching, which lets you combine several SQL statements, even when targeting different data access objects, into a single network call, which can significantly reduce network overhead. Heterogeneous batch updates are supported for both static and dynamic executions.


Performance results and interpretations

The dynamic statement cache hit ratio is a key factor in determining how much improvement your application can achieve by switching from dynamic to static SQL execution. As the article "No Excuses" Database Programming: Make your Java programs fly with Static SQL and pureQuery explains, much of the overhead for dynamic statements is in preparing the statement to run. However, since statements that have already been prepared can reside in the dynamic statement cache, a repeated execution of that same exact statement would not require another prepare operation. Thus, if every SQL statement the application executes is already in the dynamic statement cache (that is, the hit ratio is 100 percent), the CPU reduction benefits of switching to static SQL won’t be as great when compared with the situation in which only half of the dynamic SQL statements were found in the dynamic statement cache (50 percent hit ratio). During our testing, the dynamic statement cache was sized to achieve a hit ratio of between 70 and 85 percent; which is typical for real-world applications.

Note about hit ratios: Getting exactly the same hit ratio between the different data access APIs used in the study is tricky because the cache is sized in 4K pages; during our tests, small changes in cache size resulted in relatively large fluctuations in hit ratio for the IRWW application. In each of the Type 2 and Type 4 tests, the cache hit ratio for pureQuery dynamic execution was higher than that of native JDBC. Consequently, our results show that pureQuery dynamic method sustains a higher ITR than JDBC. In actual fact, with similar statement cache hit ratios, one would expect pureQuery dynamic execution to have a marginally higher overhead. Similarly, the reverse is true for the documented EJB2 and JPA results because their cache hit ratios with the IRWW application were 100 percent. With similar cache hit ratios to that of pureQuery (using both dynamic and static execution), one would expect the improvement delivered by pureQuery over EJB2 and JPA to be even greater than those documented in this article.

CPU measurements during the benchmark are taken across the z/OS LPAR — there was no zIIP capacity for offload.

Measurements with JDBC Type 2 z/OS driver

In executing the IRWW with the JDBC Type 2 driver, the statement cache hit ratios in Table 1 were observed:


Table 1. Statement cache hit ratio observed in executing the IRWW with JDBC Type 2 driver.
Table 1. Statement cache hit ratio observed in executing the IRWW with JDBC Type 2 driver.

Figure 4 compares the normalized throughput (ITR) of the different APIs when executing the IRWW using the JDBC Type 2 driver.


Figure 4. Results of normalized throughput using JDBC Type 2 driver.
Figure 4. Results of normalized throughput using JDBC Type 2 driver.

The results highlight the fact that static execution can achieve significantly greater throughput than dynamic JDBC. Figure 5 uses these results to normalize the transaction throughput and total system wide CPU consumption compared to that of JDBC.


Figure 5. Normalized comparison of results with JDBC Type 2 driver – positive values represent reduction in CPU.
Figure 5. Normalized comparison of results with JDBC Type 2 driver – positive values represent reduction in CPU.

The charts in Figure 5 show that pureQuery with static execution (both annotated method style and client optimization) achieved a 42 percent reduction in CPU per transaction compared to native dynamic JDBC alone. On the test system, this 42 percent reduction in CPU per transaction resulted in an overall system-wide CPU reduction of 5 to 7 percent. The client optimization results are particularly impressive since no source code had to be modified.

Measurements with JDBC Type 4 z/OS driver

With the JDBC Type 4 driver, the statement cache hit ratios in Table 2 were observed during the tests:


Table 2. Statement cache hit ratios with the JDBC Type 4 driver.
Table 2. Statement cache hit ratios with the JDBC Type 4 driver.

Figure 6 shows the normalized throughput (ITR) of the different APIs when executing the IRWW using the JDBC Type 4 driver.


Figure 6. Results of Normalized throughput using the JDBC Type 4 driver.
Figure 6. Results of Normalized throughput using the JDBC Type 4 driver.

The results with the Type 4 driver show that, as with the Type 2 driver, static execution (whether with pureQuery annotated method style or with client optimization) can achieve significantly greater throughput than JDBC.


Figure 7. Normalized comparison of results with JDBC Type 4 driver – positive values represent reduction in CPU.
Figure 7. Normalized comparison of results with JDBC Type 4 driver – positive values represent reduction in CPU.

The normalized charts in Figure 7 show that with the IRWW, pureQuery with static execution achieved a 25 percent reduction in CPU per transaction over using native dynamic JDBC alone. On the test system, this 25 percent reduction in CPU per transaction resulted in an overall, system wide CPU reduction of 6 percent.

The client optimization results in Figure 7 show that this new feature provided the IRWW application with a 15 percent reduction in CPU per transaction, while giving a 5 percent reduction in overall CPU consumption across the LPAR, compared to native JDBC. This is a significant improvement considering the fact that this particular test did not change a single line of source code in (or recompile) the IRWW application.

During each test, the CPU consumption of both the IRWW application and DB2 were individually monitored. With both the Type 2 and 4 drivers, moving from dynamic JDBC to pureQuery static, the proportion of CPU consumed by DB2 decreased, while the proportion of CPU consumed by the application increased. The application CPU increases because it is able to process more transactions within the same unit of time, while the DB2 CPU decreases because it no longer has to prepare the SQL statement.

Both tests also illustrate that with the IRWW workload pureQuery dynamic execution can be 5 to 10 percent more efficient on CPU consumption than JPA, and 10 to 15 percent more efficient than EJB2. In fact, this performance advantage is understated due to the much higher initial statement cache hit ratio for JPA and EJB2.


Summary

The results of the performance study show that static execution of SQL using the Data Studio pureQuery Runtime for z/OS can significantly reduce the CPU cost per transaction of an application. In the benchmark environment employed for this study, this approach resulted in a 6 percent reduction in total LPAR CPU consumption using the Type 4 driver and a 7 percent reduction when using the Type 2 driver, compared to native JDBC. Furthermore, with the new client optimization feature in version 1.2, there’s no need to change a single line of source code within your application, or even to have access to the source code, a fact that greatly reducing the cost of implementing pureQuery runtime

The reduction in CPU and increased throughput when using pureQuery static API and Client Optimization results because the SQL statements no longer have to be dynamically prepared and executed. If the JDBC dynamic statement cache hit ratio for the IRWW workload was nearer 100 percent, then the performance benefit of introducing the pureQuery API with static execution or using client optimization would be reduced. Similarly, if the cache hit ratio was less than those quoted during the benchmark, the benefit would be even greater.

This study also demonstrates the performance advantage of using JDBC and pureQuery over JPA or EJB2. Using the Type 2 driver on the IRWW application, pureQuery static can deliver a staggering 73 percent reduction in CPU per transaction over EJB2, and a 58 percent reduction over JPA. With the Type 4 driver, pureQuery static can deliver a 60 percent reduction in CPU per transaction over EJB2 and a 39 percent reduction over JPA. The overall CPU savings across the LPAR range from 7 to 23 percent when switching from EJB2/JPA APIs to pureQuery static.


Should you use pureQuery

In our recommendations, we attempt to not just focus on the use of pureQuery alone, but also indicate those cases in which pureQuery technology (such as client optimization) can be used to help with real world environments.

New Java data application development

For data-centric applications, from a performance and security perspective, static SQL provides significant benefit, including the potential for greater application growth before requiring purchase of additional hardware. pureQuery annotated method style provides ease of use and diagnostic benefits for developers as well the ability to deploy statically with no change to the application.

If your developers require object management or are reluctant to move from their existing framework, then you can still use client optimization to gain the benefits of static SQL for those applications.

Existing JDBC applications and frameworks

If the environment for your existing JDBC applications is already expertly tuned (that is, you are closer to 100% hit ratio in your statement cache), then it is unlikely you would want to convert those existing applications to pureQuery for performance reasons alone, but you will still benefit from the improved reliability of plan stability and the improved security and governance model of static SQL.

Consider using client optimization to test the advantages of static SQL and to provide developers and DBAs with the outline view, which lets you more closely pinpoint the location of a problematic SQL statement in the Java source.

Existing SQLJ applications

If you are using SQLJ you are already getting the performance benefits of static SQL. There is no need to convert existing applications to pureQuery unless you wish to take advantage of the new API features in 1.2 such as heterogeneous batch updates, which let you combine several SQL statements into a single network call, even across multiple data objects or API calls. If you like the benefits of pureQuery from a development and administrative perspective, it might be worth considering doing new development on pureQuery. For example, in addition to the developer tooling and enhanced diagnostic capabilities with pureQuery, it is simpler to bind and deploy with pureQuery than with SQLJ.


About the authors

Simon Harris

Simon Harris is a Performance Engineer within the WebSphere Federation Server development team in the Silicon Valley Laboratory. Simon has been working with federated database technology since its inception in IBM in 1995, supporting many customers in both a pre- and post-sales capacity throughout Europe, the Middle-East, and Africa.

Rajesh Sambandhan is an advisory software engineer on the IBM Information Management team, India Software Labs. He has been in the industry for more than seven years, working primarily in J2EE and Test Driven Development.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=386564
ArticleTitle=IBM Data Studio pureQuery Runtime for z/OS Performance
publish-date=07012008
author1-email=simond_harris@uk.ibm.com
author1-email-cc=
author2-email=rasamban@in.ibm.com
author2-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers