The IBM Data Studio family made its debut in 2007, with the goal of increasing organizational productivity and effectiveness while improving the quality of service, cost of ownership, and governance of diverse data, databases, and data-driven applications.
A key component of that vision is the goal of increasing development productivity while optimizing the performance of Java data access development. In The Easy Way to Quick Data Access, the authors illustrate how the rather disconnected worlds of Java development and database access can be brought together with groundbreaking technology called pureQuery, a high performance Java data access platform delivered in IBM Data Studio Developer and IBM Data Studio pureQuery Runtime.
pureQuery makes it easy to develop Java data access applications that can switch between static and dynamic execution without changing a line of code. IBM is bullish on static SQL for Java; after reading this article, you may be bullish on it as well.
What do we mean when we talk about Static SQL?
If you search forstatic SQL on the Web, you'll find lots of definitions and discussion, but most of it isn't very clear. In the context of DB2, when we talk about static or dynamic SQL, we're really referring to the point in time in which an access plan the server's strategy for executing the statement is determined and the code to execute that plan is generated. There are other differences, such as using parameter markers for dynamic SQL instead of host variables for static SQL, but for the purposes of this article, a s tatic SQL statement is embedded within an application program and is precompiled and bound into a packagebefore the application is executed. This gives the data server the opportunity to lock in the access plan before the program runs. In contrast, dynamic statements are prepared and an access plan selected while the program is running.
One simple way to think of this is that static vs. dynamic SQL is similar to compiled vs. interpretive applications in that the executable code is generated before execution time rather than during execution time. Because there is a CPU cost associated with analyzing an SQL statement and creating an access plan (see in Figure 1), dynamic SQL can hinder application performance.
Figure 1. Dynamic SQL has more execution overhead
Let's step back and take a look at how IBM's definition of static SQL is different (and differentiated) from what is sometimes used in the industry. Many vendors simply define static SQL as meaning the SQL statement is embedded in the program and doesn't change. What's different about their meaning of static SQL is that there is no concept of selecting the access path prior to executing the program. So, in essence, their treatment of static SQL is the same as DB2's treatment of dynamic SQL.
Many vendors, including IBM, work hard to minimize the overhead of preparing the access path at runtime by using prepared statement caches to make dynamic SQL execute as fast as possible. However, achieving high cache hit ratios typically requires system tuning of cache sizes and also requires that your SQL statement in JDBC is coded correctly. In typical client environments, this optimization doesn't occur. Therefore, in many cases, static SQL can provide faster performance, more consistent response times, and other benefits as well.
Compare static and dynamic behind the scenes
Figure 2 is a high level look behind the scenes at the back and forth between the application and the data server for both dynamically executed SQL in Java (JDBC, or pureQuery deployed in dynamic mode) vs. statically executed (SQLJ, described below, or pureQuery deployed in static mode).
Figure 2 . Dynamic vs. static execution.
With dynamic execution, the SQL statement is completely ignored at program preparation time. So when you write a program using dynamic SQL, at runtime the application has to communicate more information with the database server before it executes the statement. Invoking the JDBC Connection.prepareStatement method for example, sends a message for prepare and for describing the output, and then sends another message for describing the input. Note that the network message for a prepare could be quite large since the entire statement string is sent over the network. Although the JDBC driver works hard to optimize the number of network messages, it's highly possible that you will have multiple messages going back and forth, and that those messages will be larger.
There may also be issues with catalog contention because during the prepare (compilation), all the objects referenced in the statement must be resolved against the same set of system catalogs.
So what does this have to do with Java?
Now let's examine how static and dynamic SQL are used in Java. With Java, you have historically had a couple of different options for data access:
- JDBC, the dominant Java data access API. The hassle of coding with JDBC has led to the development of frameworks, such as Hibernate or Spring or EJBs, to help ease the coding burden. It doesn't matter which of these frameworks is used, because at the moment of truth, the program executes as dynamic SQL (that is, it is prepared when the program runs). Sometimes the dynamic SQL generated by such frameworks is less than ideal from a data access-performance perspective.
- SQLJ, a standard API for enabling embedded SQL for Java programs . As one of the authors of this standard, IBM believes that the performance benefits of static SQL should also be extended to Java programs and provides customizations to enable static execution as we have defined it here. (Note that other vendors who support SQLJ do not provide this customization.)
In reality, SQLJ has not enjoyed nearly the adoption rate of JDBC. Its use has been primarily limited to DB2 for z/OS and Oracle environments. DB2 for z/OS DBAs are in general the most familiar with the benefits of static SQL. They've historically been the most sensitive to how much resource is being used because their CPU costs translate directly to dollars. They are required to forecast and predict performance costs, and the mission-critical applications they run are often extremely response-time sensitive and must be highly secure. Nowadays, mission-critical applications run on all platforms, and the need to be aware of database performance and security is not restricted to mainframe applications.
Most Java data access applications today use dynamic SQL. IBM wants to change this, by making it easy for Java applications to use static SQL and obtain all of its benefits. This is why IBM developed the pureQuery technology it delivers in Data Studio Developer and Data Studio pureQuery Runtime.
Stable (and better) performance without extensive tuning
Highly tuned dynamic systems developed by experts can achieve comparable performance to SQLJ or pureQuery. As we mentioned, there are optimizations you can use to mitigate the overhead of prepares with dynamic SQL, including caching the catalog and by leveraging the dynamic statement cache. The dynamic statement cache keeps the prepared statement in memory so that the next time it's executed, it can use the cached, already prepared version.
If you have the memory to spare, using the statement cache is a pretty good option when you have the same statement executing over and over again, perhaps with just the parameter values changing. However, some applications aren't able to get high cache hit ratios because they execute many different statements or they execute in the context of mixed workloads. We find that most customers average a 70 to 80% cache hit ratio. Coding issues can also de-optimize your caching strategy. For example, developers would need to be very careful to avoid using literals in place of parameter markers. Each time you use a new literal, that is treated as a new statement from a caching perspective, which means it is more likely that you will need to do a full prepare.
IBM Data Studio Developer leverages pureQuery to help you achieve expert-equivalent performance for Java data access. In essence, pureQuery static execution helps developers automatically produce better results to deliver good performance:
- The statements are all bound into the package, so you don't get "cache misses" for an individual query.
- You don't have a way to concatenate literals into your query, so you will naturally use host variables instead.
Unpredictability is rarely desired for application performance. Service-level agreements require a certain level of predictable performance, and you don't want to sign up for something that you can't deliver. An access path that changes unexpectedly in a production environment can lead to unpleasant surprises. The advantage of static SQL is that the access path is locked in. If you have infrequently executed applications for which you want better performance or if you want to guarantee more consistent performance over time, you may want to look again at static SQL as an option.
When static is not an option
Static SQL is not the answer to every problem. If you can't know the SQL in advance, you really need to use dynamic access. Also, if you need to change the host variable type, static SQL is not always feasible. Fortunately, static vs. dynamic is not an either/or decision. It's possible for your program to use some dynamic statements as well as some packages that contain your static statements. This is even easier to do with Data Studio Developer because you are using the same environment and the same pureQuery APIs for both static and dynamic development. See the sidebar for more information about how pureQuery can help improve dynamic execution performance as well.
New in Version 1.2
pureQuery introduces support for heterogeneous batch updates. A heterogeneous batch update lets you combine several SQL statements, even when targeting different data access objects, into a single network call, which can significantly reduce network overhead. Heterogeneous batch updates are supported for both static and dynamic executions.
Expectations and results
The goal with static SQL is to reduce the number of bytes sent on the network, reduce the number and size of network messages, and eliminate the full statement preparation during application execution. The benefit when compared with dynamic JDBC will depend heavily on how much network optimization the JDBC driver can do, how often your dynamic statement already exists in the cache, and how optimized your existing JDBC code already is.
Even more dramatic results might be seen when comparing with container-managed persistence. Comparing pureQuery static with EJB CMP 2.X in a strictly controlled lab environment, IBM internal testing showed reduction in CPU of at least 60%.
Not convinced? Think about security!
OK, let's assume you live in a perfect world where all your developers are JDBC coding experts, your system is stable and well-tuned, and your out-of- the-box JDBC applications perform well in your environment. Aside from being able to take vacation whenever you want, all this talk of performance and consistent response times may not resonate with you. Nevertheless, we live in a world where people do bad things, so there is another important reason to consider static SQL securing the data from unauthorized access. We strongly recommend that you read Enterprise Data Management: Governing for Data Security for a good overview of security options and considerations. For the purposes of this article, we focus on the aspect of table security.
With dynamic SQL, the DB2 DBA has to give appropriate table or view access privileges to the program's end users. This may not seem like a problem, but it really means that you have given permission to a user, who, if they find another way to access that table, could execute any SQL statement which their table-level authority allows. A malicious user could attempt to use this permission to retrieve or modify table data in unexpected ways.
Static SQL cuts the risk associated with giving end users table access. With static SQL, the end users don't need any table privileges. They simply require EXECUTE privilege on the package that contains the SQL for a given application. That restricts the user to only executing SQL in the context of the application logic. The only person who gets table access is the package owner (who is specified on the BIND. Because you only have to grant table privileges to this OWNER not to the multitudes that will end up executing the program, users will never be able to access tables using any other statements than those defined in the program (See Figure 3).
Figure 3. Dynamic vs. static SQL table privilege administration.
Another security vulnerability that can occur with dynamic SQL is known as SQL injection, which means that an application that is creating the SQL string does not properly sanitize the input (such as cleaning up escape characters), which can lead to all kinds of interesting vulnerabilities. With static SQL, you simply don't have that problem because the application is not building up a SQL statement for execution at run time. The statement is known at bind time, so the data server can perform type checking and validation prior to execution, preventing unexpected problems at runtime.
Static SQL improves dependency management and forecasting
- Because DB2 stores detailed information about static packages and their dependencies in its catalog, it's easier for administrators using tools such as DB2 Change Management Expert or DB2 Administration Tool for z/OS to query the catalog to understand the impact of dropping objects.
- For z/OS, DB2 provides accounting information on a package basis; therefore, it for static applications that have their own packages, it is easier to extrapolate from the cost today of running that package what the effect will be of adding 100 or 1000 users next week.
Enhanced problem diagnosis
Let's say you're working with a developer to do system testing for a new JDBC application that's going online soon. Unfortunately, something somewhere is grinding to a near halt. All you can really do is tell the developer that the SQL in his or her application is causing the problem. It can be quite difficult to narrow down which SQL statement is causing the problem because JDBC applications all run with a generic package ID. You and the DBA might need to go through a painful process to narrow down the problem to find the problem SQL.
Determining the application source of a problem statement would be easier with static SQL. With static SQL, there are unique package names for each application. Usually there is a simple one to one mapping between a Java interface and a package name. If long package names are used, it can be even easier to identify the source application. Since package names are identified by monitoring tools, you should be able to quickly narrow down the source of the problem statement.
No more excuses--try static SQL!
If you're a DB2 developer or DBA, what's your excuse for not trying static SQL? Now's the time to give it a shot. Data Studio Developer tools and pureQuery technology make it a snap, and here's why:
Transparent transition to static. You can code an application and test its execution dynamically and statically, without changing a line of code. pureQuery supports two styles of coding annotated method and inline style. The "annotated method" name is rooted in the way SQL statements are defined in the application. In an annotated method pureQuery application, the SQL string is defined as an element of a Java 5.0, pureQuery Annotation , such as @Select, @Update and @Call. pureQuery tooling automatically generates an optimized implementation for the annotated method. The annotated method style is designed specifically to be simple, quick to code, and capable of supporting both static and dynamic execution.
To actually execute statically, a statement must be bound into a static package at the server. However, with Data Studio, it only takes a single click in Data Studio to bind the application (see Figure 4).
Figure 4. Binding a pureQuery application.
Then, simply set the execution mode property to STATIC. Figure 5 shows one way to do this by passing it as a JVM system property.
Figure 5. Executing statically.
With Data Studio Developer and pureQuery Runtime, developers don't need to change a line of code to develop a data access application and execute it using either dynamic or static SQL, making it very easy to follow the recommendation to try both static and dynamic SQL and see which works (or performs) best for you.
If your application requires some dynamic statements, developers can use the pureQuery inline style for those. For more information on the different styles of coding, see these articles on annotated method and inline style.
Java programs statically
With the 1.2 release of Data Studio Developer and Data Studio pureQuery Runtime, you can try the benefits of static SQL execution on existing JDBC applications, whether developed in-house or delivered as part of a packaged application.
This capability is made possible by the 1.2 release's client optimization technology, the first example of a new approach to performance optimization that focuses on how to optimize database access from the database client rather than only within the database engine. The client optimization for JDBC access in the new release includes these features and functions:
- Captures SQL for Java-based applications, including those that leverage a framework for persistence.
- Enables developers or administrators to selectively bind the captured SQL for static execution without changing a single line of application code.
- Provides the ability to define at runtime whether to execute captured SQL statically or dynamically. SQL that hasn't been captured previously can be executed dynamically or (optionally) be prevented from executing.
- Runs either interactive or in batch mode.
Rich developer tools. Data Studio Developer tools make it quick and easy to develop a data access Java application with familiar Eclipse-based tools, such as SQL assist integrated right into the Java source editor.
New in 1.2: Enhanced, built-in problem determination
Developers can spend considerable time isolating performance issues: first to a specific SQL statement, then to the source application, then to the originating code. Three-tier architectures and popular frameworks make this isolation more difficult as the developer may never see the SQL generated by the framework. Whether you use the pureQuery API or client optimization to capture SQL, Data Studio Developer 1.2 makes it much easier to find problematic SQL by providing an outline view that traces SQL statements back to the originating line in the source application. (See Figure 6.)
The Java perspective in the outline view of Data Studio Developer.This visualization can aid developers and DBAs working together for impact analysis as well, by additionally correlating the tables that are accessed making it easy to answer questions like Where is this column used in my application?, "What tables does this application access?", or "What SQL does that application issue against this table?"
Simple deployment model. Before pureQuery, it could be complicated to deal with the binding process and all of its artifacts. Happily, the model for pureQuery applications does not require DBRMs, .bnd file, or serialized profiles. (And if you've never heard of these, don't worry!)
We hope we've given you some food for thought about static SQL for DB2. Data Studio Developer makes it easy to develop and deploy static SQL for DB2. Static SQL has proven to be an effective way to improve performance, save money, and tighten security for many applications. Now Java applications have that same capability packaged in a way that is friendly for Java developers and for DBAs, too. Think about pureQuery and static SQL when developing your next Java data access application.