Buffered inserts in partitioned database environments

A buffered insert is an insert statement that takes advantage of table queues to buffer the rows being inserted, thereby gaining a significant performance improvement. To use a buffered insert, an application must be prepared or bound with the INSERT BUF option.

Buffered inserts can result in substantial performance improvement in applications that perform inserts. Typically, you can use a buffered insert in applications where a single insert statement (and no other database modification statement) is used within a loop to insert many rows and where the source of the data is a VALUES clause in the INSERT statement. Typically the INSERT statement is referencing one or more host variables that change their values during successive executions of the loop. The VALUES clause can specify a single row or multiple rows.

Typical decision support applications require the loading and periodic insertion of new data. This data could be hundreds of thousands of rows. You can prepare and bind applications to use buffered inserts when loading tables.

To cause an application to use buffered inserts, use the PREP command to process the application program source file, or use the BIND command on the resulting bind file. In both situations, you must specify the INSERT BUF option.

Note: Buffered inserts cause the following steps to occur:
  1. The database manager opens one 4 KB buffer for each database partition on which the table resides.
  2. The INSERT statement with the VALUES clause issued by the application causes the row (or rows) to be placed into the appropriate buffer (or buffers).
  3. The database manager returns control to the application.
  4. The rows in the buffer are sent to the partition when the buffer becomes full, or an event occurs that causes the rows in a partially filled buffer to be sent. A partially filled buffer is flushed when one of the following occurs:
    • The application issues a COMMIT (either explicitly, or implicitly through application termination) or ROLLBACK.
    • The application issues another statement that causes a savepoint to be taken. OPEN, FETCH, and CLOSE cursor statements do not cause a savepoint to be taken, nor do they close an open buffered insert.
      The following SQL statements will close an open buffered insert:
      • BEGIN COMPOUND SQL
      • COMMIT
      • DDL
      • DELETE
      • END COMPOUND SQL
      • EXECUTE IMMEDIATE
      • GRANT
      • INSERT to a different table
      • OPEN CURSOR for a full-select of a data change statement
      • PREPARE of the same dynamic statement (by name) doing buffered inserts
      • REDISTRIBUTE DATABASE PARTITION GROUP
      • RELEASE SAVEPOINT
      • REORG
      • REVOKE
      • ROLLBACK
      • ROLLBACK TO SAVEPOINT
      • RUNSTATS
      • SAVEPOINT
      • SELECT INTO
      • UPDATE
      • Execution of any other statement, but not another (looping) execution of the buffered INSERT
      • End of application
      The following APIs will close an open buffered insert:
      • BIND (API)
      • REBIND (API)
      • RUNSTATS (API)
      • REORG (API)
      • REDISTRIBUTE (API)

    In any of these situations where another statement closes the buffered insert, the coordinator partition waits until every database partition receives the buffers and the rows are inserted. It then executes the other statement (the one closing the buffered insert), provided all the rows were successfully inserted.

The standard interface in a partitioned database environment, (without a buffered insert) loads one row at a time doing the following steps (assuming that the application is running locally on one of the database partitions):
  1. The coordinator partition passes the row to the database manager that is on the same partition.
  2. The database manager uses indirect hashing to determine the database partition where the row should be placed:
    • The target partition receives the row.
    • The target partition inserts the row locally.
    • The target partition sends a response to the coordinator partition.
  3. The coordinator partition receives the response from the target partition.
  4. The coordinator partition gives the response to the application.

    The insertion is not committed until the application issues a COMMIT.

  5. Any INSERT statement containing the VALUES clause is a candidate for buffered insert, regardless of the number of rows or the type of elements in the rows. That is, the elements can be constants, special registers, host variables, expressions, functions and so on.
For a given INSERT statement with the VALUES clause, the Db2® SQL compiler might not buffer the insert based on semantic, performance, or implementation considerations. If you prepare or bind your application with the INSERT BUF option, ensure that it is not dependent on a buffered insert. This means:
  • Errors can be reported asynchronously for buffered inserts, or synchronously for regular inserts. If reported asynchronously, an insert error might be reported on a subsequent insert within the buffer, or on the other statement that closes the buffer. The statement that reports the error is not executed. For example, consider using a COMMIT statement to close a buffered insert loop. The commit reports an SQLCODE -803 (SQLSTATE 23505) due to a duplicate key from an earlier insert. In this scenario, the commit is not executed. If you want your application to really commit, for example, some updates that are performed before it enters the buffered insert loop, you must reissue the COMMIT statement.
  • Rows inserted can be immediately visible through a SELECT statement using a cursor without a buffered insert. With a buffered insert, the rows will not be immediately visible. Do not write your application to depend on these cursor-selected rows if you precompile or bind it with the INSERT BUF option.
Buffered inserts result in the following performance advantages:
  • Only one message is sent from the target partition to the coordinator partition for each buffer received by the target partition.
  • A buffer can contain a large number of rows, especially if the rows are small.
  • Parallel processing occurs as insertions are being done across partitions while the coordinator partition is receiving new rows.

An application that is bound with INSERT BUF should be written so that the same INSERT statement with VALUES clause is iterated repeatedly before any statement or API that closes a buffered insert is issued.

Note: You should do periodic commits to prevent the buffered inserts from filling the transaction log.