DB2DD: Your focus lately has been JavaTM performance with DB2®, which is the main topic of this interview, but could you first give us a little bit of history about yourself and your experience with DB2?
John: I have been working in IT for some 26 years, including experience gained as an IBM customer and external IT consultant. I have over 17 years experience working with DB2 and relational database systems, working in a variety of technical roles. I currently report to the Director of Development, DB2 UDB for OS/390. In this role I consult for our customers on on a variety of technical issues, including strategy, design, performance, and implementation issues. I provide that same service to IBM Development and the ITSO organization as well (the people who write Redbooks).
I specialize in design for high performance and availability, performance benchmarking, and data sharing and parallel sysplex implementation. I also seem to always be flying around the world talking at conferences. IDUG, DB2 technical conferences, and so on.
DB2DD: Your focus has moved from the "guts" of the system (like data sharing) to more of an application focus with the focus on JavaTM technology in DB2. What is the significance of that shift in focus?
John: Enabling applications on DB2 and getting those applications to run well is absolutely key to the future of DB2. DB2 already enjoys a reputation as an outstanding database engine in terms of performance and availability, particularly for mission-critical OLTP applications. Now, as business moves to the Web, we have to help our customers enable new e-business applications so that they can be successful. Our challenge has been to make DB2 web-enabled without losing the qualities of performance and reliability that our customers rely on.
The initial focus in DB2 was in supporting JDBC function, which has a very different performance paradigm than what many of our traditional users were used to with, say, CICS®-DB2 applications. Some of the early field experience indicated that path lengths were not that great. Many of the customers adopting Java-based development were pushing for price performance improvements. Given my strong background in applications and performance analysis, and the experience gained supporting customers pioneering with Java-based development on 390, it was natural fit for me to lead the performance team effort to focus on analysing Java performance and looking for ways to improve it.
DB2DD: In your experience, are many DB2 for z/OS developers using Java?
John: Use of Java-based application development has started to take off. Increasing acceptance of IBM WebSphere® is indicative of this. Most of the development work is taking place on the Windows platform, with production deployment on either UNIX® or zSeries®. There is stiff competition between IBM and other vendors to provide the platform to support the deployment of these e-business applications. It is critical to our success to be able to influence these developers on the value of our solution. For DB2, SQLJ is key to our success. The DB2 implementation using true static SQL provides a clear differentiation and advantage over competitive database systems. Althrough SQLJ can deliver on price performance, we are also working hard to keep the JDBC driver up-to-date in terms of function for SQLJ and JDBC, because there are so many applications demanding JDBC functionality.
DB2DD: Can you briefly explain the difference between JDBC and SQLJ? And how do they fit into the bigger picture of J2EE?
John: DB2 provides a JDBC driver to support the basic functions of Java applications wanting to process relational data: establish a connection to a database, execute SQL requests, and process the results. The JDBC API consists of a set of classes and interfaces written in Java that provide a standard API for Java application developers to implement database applications. This API provides for portability across platforms and database systems. The value of the JDBC API is that an application can run on any platform with a Java Virtual Machine (JVM). The JDBC API only supports dynamic SQL, which means that its performance can benefit greatly from dynamic statement caching in the database engine.
JDBC is defined as part of the J2EE platform or programming model.
The JDBC driver for DB2 also supports SQLJ. SQLJ, in contrast to JDBC, lets you write embedded SQL when developing Java applications. SQLJ was initially developed by a consortium of database vendors and has received wide database vendor acceptance: IBM, Oracle, Sybase, Informix, and Compaq (Tandem). Most importantly, the DB2 implementation of SQLJ provides support for true static SQL execution, which plays to the strengths of DB2 in terms of performance and security. Figure 1 highlights the processing flow for SQLJ, which shows how prebound packages can be used at runtime to avoid extensive runtime checking:
Figure 1. SQLJ processing flow
Like JDBC, SQLJ also provides for application portability across platforms and database systems, and SQLJ has been accepted as a standard by both ANSI and ISO. Currently, SQLJ is not part of the J2EE platform. IBM will be pushing for SQLJ to be included in the J2EE platform and the JDK. SQLJ parallels the JDBC model and covers all the basic functionality.
DB2DD: What are the advantages/disadvantages of using SQLJ vs. JDBC?
John: There are strong compelling reasons for using SQLJ when developing enterprise class applications. For example, SQLJ is more concise and is less complex to code relative to JDBC. For example, take a look at the syntax differences here:
|Syntax differences between JDBC and SQLJ|
Another advantage is that all the SQL checking in terms of syntax and type mapping can be performed during program preparation instead of at run time (see Figure 2). And, if DB2 has a customized version of the SQLJ profile, SQLJ provides for static SQL execution, which delivers better performance in terms of reduced CPU resource consumption. This is because with static execution you avoid the full cost of prepare and runtime checking.
Figure 2. Static processing is faster than dynamic
SQLJ also delivers a tighter security authorization model because users are authorized to run programs and not to access the underlying tables.
As matter of fact, stronger security is the primary reason for customers in my 'social circle' to use SQLJ, followed closely by performance. As we continue to optimize the SQLJ implementation for improved price performance, the performance advantage of SQLJ over JDBC will become even more significant for our customers. In terms of positioning, SQLJ is the strategic way to access relational data from Java applications.
Of course, an application doesn't always have to choose exclusively between SQLJ and JDBC. The JDBC driver let you "mix and match" SQLJ and JDBC in the same application. SQLJ and JDBC can share the same JDBC connection. And JDBC result sets can be turned into SQLJ iterators, and vice versa.
There are some cases where use of JDBC provides a better option. Some applications require the flexibility to dynamically build a SQL request at run time. For example, an application GUI may provide for selection from a large number of options. Coding a single SQLJ request to pull the qualifying rows may be lengthy, complex and may not optimize very well because the values of the host variables are not known until run time. It might be possible to code multiple different SQLJ requests, each of which can be executed on a best fit basis. However, this may lead to complex coding, and mileage may be limited. In such cases, the use of JDBC and dynamic SQL, and the ability to dynamically build up the SQL syntax may be a better option.
DB2DD: What are the most common mistakes that you have seen customers make when developing Java applications?
John: Well, I see a lot of these, and they can be difficult to detect if you are not looking for them. For example, if the SQLJ serialized profile is not customized, or if the customized profile is overwritten by a subsequent pass of SQLJ translation, the runtime will see an uncustomized SQLJ profile and will quietly perform JDBC-based dynamic SQL processing rather than SQLJ-based static SQL processing. It does not matter that a previous iteration with customization may have generated DBRMs and those were bound into the DB2 plan. This plan will be overridden by the JDBC plan. Some other things I see are:
- If an application using SQLJ forgets to use a specific ConnectionContext on some SQL statements, processing will quietly occur on a different (default) connection than the processing which explicitly specifies the ConnectionContext. More than one SQLJ profile will be generated, and more than one ConnectionContext class will be involved.
- Another thing that developers sometimes forget to do is to close and release resources when they are no longer being used. The JDBC driver maintains its own links to resources, and the resources are only released when the resources are closed or when the connection is closed. For this reason:
- Close ResultSets when they are done being used. If you don't do this, JVM garbage collection cannot reclaim the objects, and eventually the application may run out of JDBC resources or, even worse, run out of memory.
- Close PreparedStatements that have ResultSets as soon as they are done being used. Closing the ResultSet is not enough to release the underlying cursor resource. If you don't close the PreparedStatement, the cursor resource is tied up for the life of the PreparedStatement.
- Close CallableStatements when they are done being used, too, or else the application may run out of call sections.
DB2DD: In terms of performance, what are the top 3 things that application developers can do to improve the performance of their Java-based applications that access DB2?
John: It's tough to have to limit it to three, but I would recommend the following:
- Disable auto commit for JDBC connections, as follows:
The default is true (on), which forces a commit after each and every single SQL request.
- Make sure that you select and update only those columns that you need. Column processing is one of the major factors in CPU resource consumption. The two primary reasons for this are:
- Strings for character string columns must be converted between Unicode (Java) and EBCDIC/ASCII (DB2 engine).
- A Java object is created per column per row for those data types that are not primitive types in Java, such as character string columns.
- If you are using SQLJ, customize the SQLJ profile and request online checking. If the SQLJ serialized profile is not customized, there will be no messages and the Java application will execute dynamically using JDBC.
The online checker accesses the DB2 catalog can be invoked from the UNIX System Services command line as follows:
db2profc ... -online=<db2_location_name>
The online checker checks for JDBC/SQLJ-supported compatibility and conversion processing, and to determine the length of character string columns. There is no concept of length for character strings in Java. In order to have the predicates considered for index access, the information stored in the generated DBRM must match the definition in the DB2 catalog in terms of data type and length. Online checking adds this information to the generated DBRM.
DB2DD: And what are the top 3 things that system administrators can do to improve performance for Java applications accessing DB2?
John: For system administrators, the top three things are:
- Tune the JVM heap size. In a Java database workload using either JDBC or SQLJ to access relational data, a lot of Java objects are created and then destroyed. The JVM heap size plays an important role in overall Java application performance. The default initial heap size is 1 MB, and the default maximum heap size is 8 MB. In almost all cases these default sizes are not sufficient and lead to poor performance. The heap size studies that we've run at the IBM Silicon Valley Lab and at customer installations has shown that setting the heap size and the maximum heap size to an equal large value increases the throughput dramatically. This is because scanning for garbage collection is not triggered so often, thereby reducing the repeated scanning of long living objects.
- If price performance is important (and when isn't it?) then keep current with JDK releases and with upgrades to the JDBC driver. Quite rightly, the initial development emphasis was on delivering JDBC function. However, the last 12 months has seen dramatic improvements in CPU performance by reducing column processing overheads. This has been delivered through improvements to JDBC driver and the JDK/JVM. The JDBC 2.0 driver delivered with DB2 Version 7 has been base for delivering these performance enhancements.
- Turn on DB2 dynamic SQL statement caching with CACHEDYN=YES in your subsystem parameters (DSNZPARM). This helps to avoid the full cost of preparing dynamic SQL, and brings the CPU resource consumption for dynamic SQL closer to that of static SQL. It is strongly recommended for applications using JDBC, and also for SQLJ where cursor controlled updates are performed from a different object to the object where the cursor was opened and the row fetched.
DB2DD: Any final thoughts that you would like to leave with our readers?
John: I'd like to focus a bit on skills. There are many people with SQL skills, and there are many people with Java skill; however, very few people have both skills, and cross training is difficult. Even advanced education classes for Java tend to skip over performance. I guess I can never say it enough: If you don't develop and enforce design guidelines for database access from Java applications, you may have some performance issues to deal with down the line. At that point, you can't do much to solve those problems without opening up the application. Good JDBC performance requires a bit of care and feeding ahead of time.
That being said, I must stress that I've been quite happy with the improvements I've seen taken place in JDBC/SQLJ performance over the last 12 months, and I'm even more pleased that more improvements are in the pipeline. In terms of performance and scalability, all but the most high volume applications can now be written in Java and process relational data.