High performance inserts using JDBC Type 4 in a constrained environment

Leverage DB2 declared global temporary tables

Learn how to handle mass inserts of data using Java™ technology and IBM® DB2® Universal Driver Type 4 connection only, an efficient alternative for situations where the application code does not have access to a DB2 client, or where the use of import/load utilities do not apply. The innovative use of DB2 declared global temporary tables (DGTTs) is the key to the improved performance.

Share:

Bob Calio (bobcalio@us.ibm.com), Software Engineer, IBM

Bob Calio is a software engineer working at at the IBM Thomas J. Watson Research Center, Yorktown, NY. Bob has worked 23 years for IBM, with over 18 years of DB2 application development experience. His current areas of interest and research include leveraging DB2 and Websphere in the VLSI design tools area.



Izzy Bendrihem (igb@us.ibm.com), Hardware Engineer, IBM

Izzy Bendrihem is a senior hardware engineer working in the VLSI Design Department at the IBM Thomas J. Watson Research Center, Yorktown, NY. Izzy has worked 15 years for IBM on VLSI design tool issues for microprocessor design. He has contributed to the development of high-performance microprocessors used in game consoles, IBM mainframe, and UNIX systems. His current areas of research include multi-site design infrastructure issues for VLSI development and database-based VLSI design tools for high-performance microprocessors.



23 August 2007

Also available in Russian

Problem

Inserting rows of data into one or more relational database tables is a common task in virtually all modern client-server or Web-based applications.

In distributed, two-tiered client server environments that directly insert data into relational tables, there are two key areas of concern:

  1. Performance: The duration of the transaction from the client perspective. Remote database transactions can take many minutes to complete if the rows of data to be inserted number in the thousands. Additional penalties will be incurred with the presence of any defined triggers, indices, and so on.
  2. Concurrency: How many clients can simultaneously access the target table in read mode, write mode, or both. Other users could possibly be denied access to the target table (deadlock) if the application fails to commit often enough or holds an exclusive lock on the target table for an extended period of time.

If the number of rows to be inserted from the client to the server is relatively small (say tens of rows), the problem can be easily handled by the direct insert method with virtually no performance or concurrency issues.

If, however, there are many rows to insert (say thousands or tens of thousands), then the current known solutions all have limitations. One solution is to have each client install the DB2 client code locally and execute load/import utilities to insert the data into the target table.

In this case, performance is generally not an issue, however, concurrency would suffer greatly with this approach. In fact, this solution is problematic in several ways, including authority, concurrency, and error handling:

  • Authority - In DB2, executing load utilities requires higher authority than what is usually granted to or required by general users. This may introduce undesirable side effects, including security or other issues.
  • Concurrency - The load utility could acquire an exclusive lock on the target table. An exclusive lock on a table prevents any level of access (including read only) to the table for the duration of the load, reducing concurrency to one.
  • Error handling - A very important aspect of an online, transactional system is the ability to recover from errors. If a load utility fails, the data are not inserted into the table, and, in some cases, tablespace can be left in an unavailable state, requiring manual intervention by the DBA to clear the error. This situation will impact not only the immediate client (and table), but also any other clients trying to access the target table (or tables in that tablespace) , reducing concurrency to zero.

In addition to the added burden of installing DB2 code on each machine, the load utility solution falls short in the key areas of concurrency, security, and error handling.

The challenge is to achieve maximum table concurrency and insert performance with minimal client authority in a distributed, thin client (no database client code) environment, with error retry capabilities.


Solution

At the core of this solution is an innovative approach of using declared global temporary tables (DGTT) and Java Database Connectivity (JDBC) to insert mass amounts of data into a relational database table from remote, distributed thin clients. This approach has many advantages over known solutions, including but not limited to:

  • Minimal/no table locking during actual insert of data into target table
  • Transaction activity is limited to the DGTT, and therefore not logged until all data are complete, validated, and inserted into target table
  • Retry of failed insert transaction without having to reload data
  • Transaction framing to allow for pre/post processing of data, without impact to target table
  • Requires only Insert authority, not special privileges
  • No impact to other users while transferring data from client to database
  • Ability to leverage SQL/DML, functions, and store procedures to cleanse or prepare data in the DGTT, prior to insert into target table
  • Ability to leverage SQL/DML to join the data in the DGTT with other tables in the database during the insert into target table

How it works

What is a declared global temporary table?

The complete description can be found in the DB2 Information Center.

From the DB2 Information Center (IBM, June 2007):
"The DECLARE GLOBAL TEMPORARY TABLE statement defines a temporary table for the current session. The declared temporary table description does not appear in the system catalog. It is not persistent and cannot be shared with other sessions. Each session that defines a declared global temporary table of the same name has its own unique description of the temporary table. When the session terminates, the rows of the table are deleted, and the description of the temporary table is dropped."

A declared global temporary table (DGTT) is created at runtime as exact or modified structural copy of the target table, acting as a private buffer and effectively giving each user their own sandbox to work in. No data is copied from the source table, only the table structure.

Unlike regular tables, DGTT are not registered in the database system catalog, saving time on the create step as well as making the DGTT "invisible" to other users because the DGTT cannot be accessed by any other user/application other than the original owning application. The transactions (inserts, updates, deletes) performed on DGTT are also not logged in the transaction log. Since each DGTT is isolated from all other users, there is no concurrency issue when manipulating the data at this point. All of this adds up to a very fast and efficient mechanism to insert rows of data into a database table.

After the client has completely loaded the DGTT (from a flatfile or in core data), the application is ready to move the data from DGTT to the target table. At this point, the data in the DGTT can easily be "cleansed" or prepared any way that the application chooses to do so in complete isolation (no exposure to deadlock) and with no transaction logging. The actual insert from DGTT to target table is made with a simple (or complex) SQL/DML statement, and the work is done on the server with little or no network traffic involved. Since this work is done locally on the server, it's faster than remote transactions, thereby minimizing the chances of locking the table.

If the insert from DGTT to target table fails for any reason, the entire target table transaction (insert) can be rolled back and retried any number of times. As long as the application does not destroy the DGTT, the data still reside on the server and does not need to be transferred from the client again. Simply retry the SQL insert statement, checking for errors.

Upon successful movement of data from the DGTT to the target table, the application can drop the DGTT, or it will be destroyed when the application disconnects from the database.

Figure 1 shows a block diagram of the process:

Figure 1. Insert flow
Insert flow
  1. The process starts with the initialization of the application to get all required information (target table name, file, and so on).
  2. Make the database connection.
  3. The declared global temporary table (DGTT) is created as either a modified or exact copy of the target table.
  4. Data are read in from file or created in core and prepared for insert.
  5. Data are inserted Into the DGTT.
  6. Do pre-processing.
  7. (Cleansing or joining of data if needed); Insert into target table.
  8. Check for any errors and perform error/retry logic.
  9. Do post-processing of the data in the target table (if any).
  10. Commit the changes, and exit.

Figure 2 illustrates the data flow:

Figure 2. Data flow
data flow
  1. Get connection information, and connect to database. The transaction-specific parameters can be read from command line switches or created on the fly.
  2. Declare a global temporary table (DGTT) as an exact or modified copy of the target table.
  3. Data to be inserted can be read from a file, from a pipe (stdin), or created on the fly. Insert all rows from the input data stream into the DGTT using a prepared statement (no TRX logging here). To speed processing even further, make sure you use the "batch" processing feature of JDBC to execute multiple insert statements at a time. If error, do error processing. (Target table is still untouched.) If success, commit to save the rows in the DGTT. Do any work on data in DGTT (sort, calculations, and so on).
  4. Execute an insert SQL to move all rows from the DGTT to the target table (logging occurs here). All activity is done on the server, with minimal network traffic. If an error occurs here, you can retry the insert, as the DGTT still has the data. Do any post-processing; in other words, fire off application event / DB2 trigger.
  5. Commit, drop DGTT, and exit.

Pseudo code listing

Listing 1. Pseudo code
//Start the processing, get a DB2 connection

Connection connection = getConnection();
//change table name
String targetTable = " Your Table";
//this will work as is and will generate unique table name
String tempTable = "SESSION.T"+System.currentTimeMillis();

//Assumes that you have relatively stable insert statement with
//a fixed number of columns.  If you need a dynamic insert statement,
//you would build that on the fly.
//You need to customize this statement to match
//your input and db2 table columns

String insertSql = "insert into "+tempTable+" (column1,column2,column3)
                    values(?,?,?)";

String createDGTT = "DECLARE GLOBAL TEMPORARY TABLE "+tempTable +
      			" LIKE  " + targetTable +
                        " INCLUDING IDENTITY COLUMN ATTRIBUTES " +
                        " INCLUDING COLUMN DEFAULTS  NOT LOGGED  " +
                        " ON COMMIT PRESERVE ROWS in USERTEMP" ;
try {
	connection.setAutoCommit(false);//manage commit yourself
	statement = connection.createStatement();
	//Create a DGTT table as an exact copy of the target table
	statement.execute(createDGTT);
	connection.commit();
	//get a prepared statement
	ps = connection.prepareStatement(insertSql);


     	Do Insert Processing here
		int c = 1;
		ps.setObject(c++,value1);
		ps.setObject(c++,value2);
 		....
		ps.setObject(c++,valueN);
		ps.execute();
		connection.commit();//all rows inserted to DGTT

		try {
                  boolean done = false;
                  for (int retries=0;!done " " retries < 10 ; retries++ )
                  {
                       //now copy all rows from Temp-->target tabl
                       statement.executeUpdate(" insert into "+targetTable+
                                               " select * from "+tempTable);
                       connection.commit();
                  }
		} catch (SQLException e) {
               //could rollback and do retry logic here,
               //since all rows are still in Temp Table
	         try {connection.rollback();} catch (Exception e1) {}
	         if (e.getErrorCode()==-911 ||
                     e.getErrorCode()==-964){// deadlock, try again
	             done=false;
	             long s = random.nextInt(120);//0-120 seconds
	             Thread.sleep(s*1000);//sleep for ? seconds and try again
	         }else {
	             e.printStackTrace();
               }
		}

	//cleanup
	statement.execute("DROP TABLE "+tempTable);//optional, but good practice

} catch (Exception e) {
	e.printStackTrace();
      try{connection.rollback();} catch(Exception e){}
} finally {
	try{ps.close();} catch(Exception e){}
	try{statement.close();} catch(Exception e){}
	try{connection.commit();} catch(Exception e){}
	try{connection.close();} catch(Exception e){}
}

System requirements

In order to use declared global temporary tables in DB2, ensure that the following conditions are met:

DB2 server requirements

Tuning the application control heap size

From the DB2 Information Center (IBM, October 2006):
"Initially, start with the default value. You might have to set the value higher if you are running complex applications, if you have a system that contains a large number of database partitions, or if you use declared temporary tables. The amount of memory needed increases with the number of concurrently active declared temporary tables. A declared temporary table with many columns has a larger table descriptor size than a table with few columns, so having a large number of columns in an application's declared temporary tables also increases the demand on the application control heap."

  1. DB2 8.2 or later, running on Windows®, Linux®, or UNIX®. DB2 9.x is preferred.
  2. An existing database. If you do not have one, create one.
  3. Modify the database to include the following changes:
    1. Create a user temp tablespace. (See " Create Tablespace Command" in the DB2 Information Center for complete command syntax.)
      • CREATE USER TEMPORARY TABLESPACE USERTEMP MANAGED BY SYSTE PAGESIZE 4096 USING ('/db2data/usertemp') ...
    2. Grant use of that tablespace to all users that need access:
      • GRANT USE OF TABLESPACE USERTEMP TO USER db2user1
    3. Increase the max application control heap size (APP_CTL_HEAP_SZ. See " Application control heap size configuration parameter" in the DB2 Information Center for complete command syntax.)
      • UPDATE DB CFG USING APP_CTL_HEAP_SZ xxx

Client requirements

A Java code implementation of this technique can run on Windows, Linux, or UNIX clients with the following:

  1. Java Runtime Environment (JRE) 1.4 or later
  2. CLASSPATH modified to include the following jar files, downloaded from DB2:
    • db2jcc.jar
    • db2jcc_license_cu.jar

See the Resources section on how to download DB2, Java technology, or both.


Conclusion

This article has explored a technique that leverages declared global temporary tables to improve the performance of inserting rows of data using thin client database applications written in Java/JDBC. You could also use this technique to do SQL query, merge, and delete commands by loading data from a flat file into a DGTT and joining with a permanent table in the database.


Acknowledgements

Thanks to Tom Bucelot and Carlos Fonseca for their help and support in the writing of this article.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=250116
ArticleTitle=High performance inserts using JDBC Type 4 in a constrained environment
publish-date=08232007