Boost JDBC application performance using the IBM Data Server Driver for JDBC and SQLJ

Diagnose performance problems with your JDBC applications, find bottlenecks, and tune performance

Developing high performing JDBC applications is not an easy task. This article helps you gain a better understanding of the factors that contribute to your JDBC application performance using the IBM® Data Server Driver for JDBC and SQLJ to access DB2® and Informix®. Learn to identify these issues and to find and alleviate client-side bottlenecks.

Babita Sonavane (babita.sonavane@in.ibm.com), Software Developer, IBM

Babita Sonavane photoBabita Sonavane is a software developer at IBM India Lab, Bangalore. Babita has more than five years of IT experience. She works actively toward maintaining and developing various features in Informix JDBC and DB2 JDBC drivers.



Smitha Pambalath (smitha.pambalath@in.ibm.com), Software Engineer, IBM

Smitha Pambalath photoSmitha Pambalath is a software engineer with IBM India Software Labs, working to improve the quality of the IBM Data Server Driver for JDBC and SQLJ on Windows, z/OS, and Linux platforms for DB2 and also for JDBC drivers for Informix.



01 November 2012

Introduction

When you are developing a relational database application using Java™, the IBM Data Server driver for JDBC and SQLJ does not throw exceptions to tell you when your Java code is running too slow. In this kind of application, you may be moving complex data from back-end servers (DB2 or Informix) to the client side using the IBM Data Server Driver for JDBC and SQLJ. This provides improved performance if the data size is fairly small. However, if a large amount of data must be transmitted from the server to a client, the performance can decrease noticeably. The challenge is to find the bottlenecks and identify a solution.

This article presents some general guidelines for diagnosing performance problems with your JDBC applications, finding the bottlenecks from the client side, and implementing techniques to improve performance.

These guidelines include:

Note: The IBM Data Server Driver for JDBC and SQLJ (type 4 and type 2 connectivity) is shipped with DB2 for Linux, UNIX, and Windows, and with DB2 for zO/S® Version 8 or later. IBM Data Server Driver for JDBC and SQLJ type 4 connectivity is shipped with IBM Informix and is supported with Informix Version 11 and later.


Managing connections

A database connection is the means by which client applications talk to the database server software. Creating a connection is an expensive affair, as it may involve allocation and management of resources across multiple clients that are trying to connect. Here are some simple tips that you can try for improving the driver's performance.

Getting better performance with the DB2 JDBC drivers

  • Type 2 driver: Local applications

    When your JDBC or SQLJ application run locally most of the time, these local applications have better performance with type 2 connectivity. Type 2 drivers convert the JDBC API calls into database-specific API calls, making them best suited for those environments. Using DataSource, it's easy to implement, as shown in Listing 1:

    Listing 1. Example of using a type 2 driver in local applications
    javax.sql.DataSource ds = new com.ibm.db2.jcc.DB2SimpleDataSource();
    ((com.ibm.db2.jcc.DB2BaseDataSource) ds).setDriverType(2);
  • Type 4 driver: Distributed application

    When your JDBC or SQLJ application runs remotely most of the time, these applications have better performance with type 4 connectivity. The type 4 driver, being is a pure Java driver, does not need any special client software to translate the database API calls, and thus is best suited for the distributed environment. Use DataSource to implement the type 4 driver, as shown in Listing 2:

    Listing 2. Example showing how to use Type 4 driver in distributed applications
    javax.sql.DataSource ds = new com.ibm.db2.jcc.DB2SimpleDataSource();
    ((com.ibm.db2.jcc.DB2BaseDataSource) ds).setDriverType(4);
  • Connection pooling

    Use connection pooling when multiple connections need to connect to the database concurrently. Connection pooling makes use of connections that are cached and reused. When the application requests a new connection to the database server, the driver searches the connection pool for a cached connection. If no match is found, only then is a new connection created; otherwise a cached connection is reused. When a pooled connection is closed, it is not completely lost but it is returned to the the pool to aid reuse of it in the future.

    For more on this topic, see the IBM DB2 Information Center.

  • Connection.close()

    Use this whenever you exit your application. Whenever a connection is established to a database, the database reserves resources for the requests that may come in the near future. Once the application connection exists, the resources may not be released and may be held for a long time, even though no more activity is taking place. Closing the connection is a way to ensure that these reserved resources are released. Close the statement object first, and this will in turn close all the Result Set objects. Then close the connection.

    Listing 3. Code for closing connection
    Connection con = ds.getConnection; //perform the actions with this connection
    con.close();        // before exiting

Using programmatic updates

IBM Data Server Driver for JDBC and SQLJ result set objects provide a way for the developer to update the data without executing complex SQL statements.

Note: The result set needs to be updateable in order to use this technique.

Here are the steps to follow:

  1. Supply the column in the result set that needs to be updated.
  2. Supply the data that needs to be changed.

To update the database in above case, you need to call updateRow() method just before moving the cursor to the next row in the result set.

In the following code listing, the value of the Name column of the result set object "rs" is retrieved using the method getString(),and the method updateString() is used to update the column with an string value "Nisha".

The updateRow() method is called to update the row in the database that contains the changed value.

Listing 4. Example showing programmatic updates
ResultSet rs;
String name =rs.getString("Name");
rs.updateString("Name","Nisha");
rs.updateRow();

Note: You can use a column index for get and updates here.

This example makes the application easier to maintain and improves performance. You might be thinking, how?

Consider the following points:

  1. While a SELECT statement is in process, the DB2 server is already positioned at that row. So we avoid the time required to locate the row, which may cause performance degradation.
  2. If it is necessary to locate the row, the server has an internal pointer to the row available(ROWID).

The above points are true only if you need to retrieve the data just for update.

Using getBestRowIdentifier()

getBestRowIdentifier() gives the optimal set of columns to use in the WHERE clause for updating data. This provides the fastest access to the data, although it depends upon the application requirement.

For example, suppose that in one of our applications we have all result columns by calling getPrimaryKeys or getIndexInfo to find unique index columns, This will work nicely but will result in complex SQL queries.

Consider the following example of a complex query:

Listing 5. Example of complex query showing fetching the data
ResultSet rs=rs.executeQuery("Select f_name,l_name,ssn,city,state from employee");
rs.executeUpdate("Update employee set city =? WHERE f_name = ? and l_name = ? and ssn = ? 
and city = ? and state = ?");

In situations where the developer calls getBestRowIdentifier () to retrieve the pseudo-column (pointers to the exact location of the record), that identifies the specific record.

DB2 supports special columns that are not explicitly defined by the user but are hidden columns of every table (for example, ROWID, coltype, and schema name). These columns are not part of our defined table and not returned from the getColumns method, but provide faster access to the data.

Consider the previous example from Listing 5.

Listing 6. Example of using getBestRowIdentifier() method
ResultSet rs = meta.getBestRowIdentifier("DBTEST", "DBCERT", "employee",0,true);
rs.executeUpdate ("Update employee set date = ? where  coltype = ?");

Selecting and using database metadata methods

System performance can be degraded by using database metadata methods, as it uses result set objects repeatedly. This is slow compared to other JDBC methods.


Minimizing the use of database metadata methods

Application developers can design their applications in such a way that they can cache the information returned by result set generated by the metadata methods of that particular database. The best way is to call getTypeInfo once in the application and cache the application-dependent elements of the result set. It is unlikely that any application would use all the elements of the result set generated by a database metadata method, so the application should restrict the columns to the ones that are needed. Calling unnecessary columns is bad for performance.

Avoid using null arguments in database metadata method calls

Using null arguments or search patterns in database metadata methods results in generating time-consuming queries. Always specify as many non-null arguments to result sets that generate database metadata methods as possible. It reduces network traffic due to unwanted results. But sometimes we need to pass a few non-null arguments for the function to return without any error message. As database metadata methods are slow, applications should invoke them as efficiently as possible.

Consider the following example in Listing 7:

Listing 7. Example using null arguments
String[] types = {"TABLE" };
DatabaseMetaData meta = con.getMetaData();
ResultSet rs = meta.getTables(null, null,"Table1", null);
ResultSetMetaData rsmd = rs.getMetaData();

It should be written as shown in Listing 8.

Listing 8. Example using non-null arguments
String[] types = {"TABLE" };
DatabaseMetaData meta = con.getMetaData();
ResultSet rs = meta.getTables(Test_Cat, employee,"Table1", types);
ResultSetMetaData rsmd = rs.getMetaData();

In the first getTables() call, the application is looking to see if the table Table1 exists. Here the JDBC driver takes the request as: return all tables, views, system tables, synonyms, temporary tables, or aliases that exist in any database schema inside any database catalog that are named "Table1."

The second call to getTables() more accurately reflects what the application is looking for. A JDBC driver interprets this request as: return all tables that exist in the "employee" schema in the current catalog "Test_Cat" where the name is "Table1." Clearly, the JDBC driver can process the second request much more efficiently than it can process the first request.

To improve performance and reliability, it's important for the application developer to design the application in such a way that it can provide as much information in the application about the object while using database metadata methods.

Never call getColumns method to determine table characteristics

Application can avoid using getColumns() to determine characteristics about a table. Instead, use a dummy query with getMetadata().

For example, consider an application that allows the user to determine the columns that will be selected.

Let's see which approach is suitable here out of these two:

  1. Using getColumns method.
  2. Using a query with getMetadata method.

GetColumns method

Consider the example shown in Listing 9. It generates a result set by generating various queries executed on the server side.

Listing 9. Example using getColumns() method
ResultSet rs = meta.getColumns(null, null, "TSDCLOB", null);
rs.next();

GetMetadata method

Consider the example shown in Listing 10. It generates a result set by preparing a dummy query which is prepared but never executed on the server.

Listing 10. Example using getMetaData() method
PreparedStatement ps = con.prepareStatement("SELECT * from TSDCLOB WHERE 1 = 0");
ResultSetMetaData rsmd=ps.getMetaData();
int numcols = rsmd.getColumnCount();
...
int column_type = rsmd.getColumnType(n);//result column information has now been obtained.
...

In both cases, a query is sent to the server. But in case 1, the query must be prepared and executed, the result description information must be generated, and a result set of rows must be sent to the client.

In case 2, a simple query must be prepared and only result description information must be generated. Clearly, case 2 is the better way .

In summary, always use result set metadata to retrieve table column information such as column names, column data types, and column precision and scale. Only use getColumns() when the requested information cannot be obtained from result set metadata (such as table column default values).


Retrieving the result set efficiently

Selecting the right cursor

This section discusses various performance issues of three types of cursors and provides guidelines for selecting the appropriate type of cursor for your application.

Forward only cursor

The forward only cursor is used for sequential reads of all the rows in a table. This is fastest way to retrieve the result rows, but is not useful for non-sequential reads.

Insensitive cursor

Use insensitive cursors in situations where you need high levels of concurrency on the DB2 server and require the ability to scroll forward or backward while going through result sets.

Here is how that processing happens:

  1. The first request fetches all of the rows and stores them on the client. Note: If the driver is using "lazy" fetching, then it will fetch many rows instead of all rows, and stores them. Thus the data request is very slow, especially in the case of long data.
  2. All the subsequent requests do not require much processing time, saving network traffic.

You should not use an insensitive cursor for a single request of one row.

Sensitive cursor

This type of cursor is used to verify that this result set is sensitive to updates. Keyset and dynamic cursors are always sensitive.

Some application avoid buffering the data from an insensitive cursor. In such situations sensitive cursors are best suited scrollable cursor model.

Scrolling through the entire result set and retrieving the data generates lot of network traffic for sequential rows. Performance can be very slow in such cases. But in a case where you are fetching non-sequential rows, performance can be better as shown in Listing 11.

Listing 11. Example using a scrollable cursor
String str;
String query;
Connection conn;
Statement stmt;
ResultSet rs;
stmt = con.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, 
                           ResultSet.CONCUR_UPDATABLE);                  
                
query = "SELECT EMPID FROM EMPLOYEE " +
        "FOR UPDATE OF ZIPCODE"; 
rs = stmt.executeQuery(query);        
rs.afterLast();                         
while (rs.previous()) {                
str = rs.getString("EMPID");               
System.out.println("Employee ID = " + str);
                
    if (str.compareTo("000101") == 0) {      
      rs.updateString("ZIPCODE","560043");    
      rs.updateRow();                       }                                      
}
rs.close();                                     
stmt.close();

Using get methods effectively

IBM DB2 JDBC provides a variety of methods for retrieving data from a result set, such as getInt(), getString(), and getObject(). The getObject() method is the most generic and provides the worst performance when the non-default mappings are specified. This is because the JDBC driver must do extra processing in order to determine the type of the value being retrieved and to generate the appropriate mapping. Always use the specific method for the data type.

Always provide the column number of the column being retrieved. The following example shows performance improvement over using column name data.

Suppose in an application you use getString method and provide the column name as a parameter, for example getString("Disha").... Here the IBM DB2 driver for JDBC and SQLJ might have to convert the column delimeter "Disha" to uppercase, and then compare "Disha" with all the column names in the column list.

Instead, if we use column index 30, for example getString(30), it will directly go to the 30th column and retrieve the necessary result, saving significant time.

We can use several other get methods with the column index as shown here in Listing 12:

Listing 12. Get methods
getString(1), 
getLong(2), 
and getInt(3), ...so on

Retrieving auto-generated keys

DB2 has hidden columns (called pseudo-columns) that represent a unique key over every row in a table. Typically, using these types of columns in a query is the fastest way to access a row because the pseudo-columns usually represent the physical disk address of the data.

In the following code listing an application can only retrieve the value of the pseudo-columns by executing a SELECT statement immediately after inserting the data. This happens prior to JDBC 3.0 specification.

Listing 13. Retrieving hidden columns using SELECT statement
int rowcount = stmt.executeUpdate ("insert into EmployeeList (name) values ('Disha')");
ResultSet rs = stmt.executeQuery ("select rowid from EmployeeList where name = 'Disha'");

There are two major drawbacksd to retrieving pseudo-columns in this way:

  • Retrieving the pseudo-column requires a separate query to be sent over the network and executed on the DB2 Connect server.
  • The search condition of the query may be unable to uniquely identify the row if a primary key doesn’t exist over a table.

With the IBM Data Server Driver for JDBC and SQLJ, you can use JDBC 3.0 methods to retrieve the keys that are automatically generated when you execute an INSERT statement to insert a row into a table.

In the following example, multiple pseudo-column values can be returned, and the application may not be able to determine which value is actually the value for the most recently inserted row.

Listing 14. Retrieving auto-generated keys
int rowcount = stmt.executeUpdate ("insert into EmployeeList (name) values ('Disha')",
Statement.RETURN_GENERATED_KEYS);
ResultSet rs = stmt.getGeneratedKeys ();

Now, the application contains a value that can be used in a search condition to provide the fastest access to the row and a value that uniquely identifies the row, even when a primary key doesn't exist on the table.

The ability to retrieve auto-generated keys provides flexibility to the DB2 JDBC developer and creates a performance boosts when accessing data.


Avoiding the movement of extra data that is not required at that point of time

This section provides detailed guidelines for optimizing system performance when retrieving data efficiently with IBM Data Server Driver for JDBC and SQLJ applications.

Choosing the right data type

DB2 JDBC application developer should design a schema and select data types that can be processed faster by the database protocol. Sometimes retrieving and sending certain data types can be costly.

Usually floating-point data is defined according to internal database-specific formats, mostly in a compressed format. So you should use integer data whenever floating point data is not required, as it is processed faster than floating-point data or Bigdecimal data.

Reducing the size of data retrieved

To reduce network traffic and improve performance, the IBM Data Server Driver for JDBC and SQLJ provides you several with methods that can reduce the size of any data being retrieved to some limit by calling setMaxRows(), setMaxFieldSize(), and setFetchSize(). You can also reduce the size of the data being retrieved by decreasing the column size.

Return only the rows and columns you need. If you return ten columns when you only need five columns, performance is decreased — especially if the unnecessary rows include long data.

LOB locator support

In conditions where the database does not support progressive streaming but you still have to retrieve LOB data, the fullyMaterializeLobData call is useful. In such situations The IBM data driver for JDBC and SQLJ uses locators internally to retrieve LOB data in chunks on an on-demand basis. It is highly recommended that you set this value to "false" when you retrieve LOBs that contain large amounts of data.

In situations when the database server supports progressive streaming, the JDBC driver ignores the value of fullyMaterializeLobData regardless of the progressiveStreaming property being set.

Note: You cannot use a single LOB locator to move data between two different databases. To move LOB data between two databases, you need to materialize the LOB data when you retrieve it from a table in the first database, and then insert that data into the table in the second database.


Using the right type of statement object in the right manner

Using the right type of statement object to perform an action is as important as choosing the right-fitting tire for your automobile. Knowing what type of statement to use at what time bears a simple 1 to 1 analogy to what you want to do. Statements, PreparedStatments, CallableStaments, and ResultSet objects are the components of most JDBC applications.

Let's see when to use what statement object.

Statement object

Since this acts like a carrier of SQL requests to the database when an SQL request sent or makes changes to the database in any way, then use the stmt.executeUpdate() method. For example, this method should be used for INSERT, UPDATE, or DELETE. When the SQL request sent is a query, then use the stmt.executeQuery() method. This returns the ResultSet object which holds the outcome of the query.

PreparedStatement object

When you have an SQL statement that is going to be repeatedly sent to the server, PreparedStatement is the best choice. When PreparedStatement is created, some part of the SQL statement is sent to the server, the access plan is generated for that SQL, and the SQL is cached. When a similar query request comes in the future, the SQL is not prepared from the start but the cached prepared statement is used instead.

For instance, in order to insert 100 values into a table TAB1, instead of running 100 instances stmt.executeUpdate ("INSERT INTO TAB1 values (10)"); in a loop, a single prepared statement is created like this:

Listing 15. PreparedStatement example
pstmt= con.createPreparedStatement(“INSERT INTO TAB1 VALUES (?)”);
for(int j  ; j<100; j++)
{pstmt.setInt(1,j);
pstmt.addBatch();
pstmt.executeBatch();}

The above code will likely improve performance, as the SQL is prepared only once whilst it is inserting 100 values.

Consider the following code for a statement object:

Listing 16. Statement object
String str = ("insert into t1 values (100)");
s.executeUpdate(str);

One network trip is made to the server here.

Listing 17. For preparedStatement Object
java.sql.PreparedStatement ps = con.prepareStatement ("insert into t1 values (?)");
for(int j=0;j<100;j++)
{  ps.setInt(1, j);
ps.addBatch();}
ps.executeBatch();

This shows that in one network trip the PreparedStatement has inserted 100 values into the table, while the statement object has inserted just one.

CallableStatement object

Callable statements are used to access stored procedures: CallableStatement CStmt= Connection.prepareCall("{call storedProc1(?, ?, ?)}");. Use the braces wisely. The braces can be used in stored procedures with any database, but if they are not needed, that is if you are sure of the requirements of the DBMS that you are using and you need not use the escape type syntax, then avoid using these braces since they consume extra resources.

ResultSet

ResultSet is used to hold data returned by the SQL query. Using smarter result sets, you can get only the data you want. Processing a complete result set throughout its length and breadth is not useful when you have performance in mind. So to have smarter result sets, you need smarter statement objects.

The following example gives you the flexibility to scroll through the result set and dynamically access the record you need, as opposed to the default behavior where you can only move forward through the result set and once it is exhausted you cannot read from it any further.

Listing 18. ResultSet example
Statement st = conn.createStatement(
ResultSet.TYPE_SCROLL_SENSITIVE,
ResultSet.CONCUR_READ_ONLY);

Here are some easy ways to dynamically move the cursor around the ResultSet. (The best way to improve performance is to structure your query with the correct WHERE clauses, so that you are only retrieving the data that you need. Dynamically positioned cursors will lead to different values being fetched when the ORDER BY clause is used in the query, as the row orders tend to change.)

Listing 19. ResultSet with 30 rows fetched, to directly go to the 25th row and make updates
rs.absolute(25); // cursor is moved to the 25th row
rs.updateString("Col1", "BANANA"); // updates the  25th   column                    
rs.updateRow(); // updates the row
Listing 20. ResultSet with 30 rows fetched, To fetch the 25th to 28th row
rs.absolute(25);
rs.relative(3); // cursor will move 3 rows ahead
Listing 21. ResultSet with 30 rows fetched, To fetch the 22nd to the 25th row
rs.absolute(25);
rs.relative(-3);  // cursor will move 3 rows back
Listing 22. ResultSet with 30 rows fetched, To fetch the last 5 rows, in case you don’t know how many rows are returned
int i =-1;
while (rs.absolute(i++)){..};
// the last row is represented as -1,  the 2nd last as -2 and so on.

Handling data efficiently

IBM Data Server Driver for JDBC and SQLJ provides quick options to send data to the DB2 server, as discussed in the subsequent sections.

Streaming using streamBufferSize for LOB and XML data

XML data used in the real world can be huge and can require large chunks of memory to save it before sending it to the database. This incurs high memory costs and affects performance. You can use progressive streaming by setting the stream buffer size to specify the size in bytes that the DB2 JDBC driver will buffer for chunking LOB or XML data.

The JDBC driver uses the streamBufferSize value regardless of the progressive streaming. It can be set as shown in Listing 23:

Listing 23. Example that shows setting streamBufferSize value
((com.ibm.db2.jcc.DB2BaseDataSource) ds).setStreamBufferSize(10000000);
//the buffer is set to  10000000 bytes.

So if the data being sent is larger than 10000000, then data is not materialized but streamed. If data size is less than 10000000, then it gets materialized before being sent back to the client.

Using deferedPrepares and sendDataAsIs

deferPrepares delays the prepare request until execution time. This saves network trips as the prepare and execute both occur on a single trip.

sendDataAsIs, when set to false, guesses the data types. This is disabled by default. For best performance results, set deferPrepares to true and sendDataAsIs to true.

To check the number of network trips and other system monitoring data, the DB2 JDBC driver provides a class com.ibm.db2.jcc.DB2SystemMonitor as shown in Listing 24:

Listing 24. ExampleTo check the number of network trips and system monitoring data
((com.ibm.db2.jcc.DB2BaseDataSource) ds).setDeferPrepares ((true));
//prepare operation would be deferred
((com.ibm.db2.jcc.DB2BaseDataSource) ds).sendDataAsIs=true;
....
...
//to monitor the performance after this is set this could be tried
DB2SystemMonitor systemMonitor =((DB2Connection)conn).getDB2SystemMonitor();
// this returns the system monitor object
systemMonitor.enable(true);
systemMonitor.start(DB2SystemMonitor.RESET_TIMES);
java.sql.PreparedStatement ps = conn.prepareStatement ("insert into t1 values (?)");
.........   //iterating through loop for 100 values
{ 
ps.setInt(1, j);
ps.addBatch();
}
ps.executeBatch();
systemMonitor.stop();
System.out.println("NUMBER of the NETWORK_TRIPS ="
+ systemMonitor.moreData(NUMBER_NETWORK_TRIPS));

Conclusion

There are many methods that you can use to assist in tuning DB2 for optimal performance when writing JDBC and SQLJ applications. This article described some easy-to-implement programming methods you can use to enhance the performance of the JDBC driver. Follow the simple techniques we've described to get a good head-start on building smarter, more robust, high performing database applications.

Resources

Learn

Get products and technologies

  • Download DB2 Express-C, a no-charge version of the DB2 Express database server that provides a solid base to build and deploy applications.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Java technology
ArticleID=843943
ArticleTitle=Boost JDBC application performance using the IBM Data Server Driver for JDBC and SQLJ
publish-date=11012012