Techniques for improving online data cleanup in a WebSphere Commerce DB2 database

This article presents tools and techniques to improve the performance of online data cleanups and to minimize the impact to the store. Implementing best practices for SQL tuning, tailoring the statements to your specific requirements, and ensuring that the database is properly tuned can help you implement cleanup scripts that do not disrupt the live site.

Share:

Andres Voldman (voldman@ca.ibm.com), Software Engineer, IBM

Author photoAndres Voldman is a Software Engineer at the IBM Toronto Lab, Canada. He is a senior member of the WebSphere Commerce Support team and a subject matter expert on performance and stability of WebSphere Commerce sites. Andres has also achieved the IBM developerWorks Contributing Author level, recognizing his valued contributions as an IBM author.


developerWorks Contributing author
        level

Meng Fu (mengfu@ca.ibm.com), Software Developer, WSO2 Inc

Author photoMeng Fu is a Software Developer with the WebSphere Commerce support organization. As a member of the WebSphere Commerce Support team, he is responsible for providing technical assistance for the WebSphere Commerce family of products. Meng graduated with distinction in 2006 from the University of Western Ontario (London, Ontario) with a Bachelor’s degree in Software Engineering.



November 2008 (First published 20 August 2008)

Introduction

To support the day-to-day operations of a WebSphere® Commerce site, database and data maintenance needs to be performed on a daily or weekly basis. Data maintenance refers to the process of removing stale data from the database. As the database grows, the performance and maintainability of the site are affected, making the maintenance process necessary.

A sound data maintenance strategy requires planning and involves multiple tasks, such as identifying the fastest growing objects, defining policies in accordance with the business, and implementing scripts to enforce them. In this article, we will focus on the latter part of the data maintenance strategy. You will learn techniques to implement, test, and tune cleanup scripts, and to minimize the disruption to the site while executing the scripts.

Role of data cleanup in the maintenance cycle

Before we look at any tools or begin tuning, you need to understand the role of data cleanup and the cleanup strategies that are available. Data cleanup is one of the maintenance tasks that are done during the maintenance window. Other operations include backups and database maintenance. Database maintenance refers to the tasks that are needed for the database to perform efficiently. This includes updating the statistics, which are used by the optimizer when determining access paths to the data (runstats), compacting and reorganizing the physical data (reorg), and rebinding static packages to make use of updated statistics (rebind).

As shown in Figure 1, database maintenance is usually done after the data cleanup. This is because the statistics need to be updated to reflect the new state of the data. The deletes will also free space in the file system and you can use reorg to compact the data.

For details about the maintenance flow, see the following technote, Maintaining a WebSphere Commerce DB2 database.

Figure 1. Diagram of the maintenance flow
Diagram of the maintenance flow

Data cleanup strategies

Most businesses cannot afford an outage to perform database maintenance. Therefore, they need to perform the cleanup while the store is receiving live traffic. If the database is maintained regularly, then the cleanup process will be brief. On the other hand, if the database contains hundreds of thousands or millions or records of stale data, then this first cleanup is challenging.

When planning the cleanup of a database that contains months or years worth of stale data, you will need to evaluate the different alternatives you can use for the first cleanup process.

Online strategy

An online strategy is used when the cleanup is executed while the site is receiving traffic. Before choosing this approach, if you are working on a database that contains a large amount of stale data, you need to consider the following factors:

  • Time it will take for the delete to execute.
  • Impact on the live site.
  • Tuning effort required.

Tuning the delete statements in a database that has never been cleaned is complicated. Operations, such as SORTs, that would normally take no time are IO and CPU intensive if millions of rows match the delete criteria. With millions of rows, an out-of-the-box SQL, that would otherwise run perfectly fine, may require a few days to tune.

For the first deletions, the deletes might need to run for days or weeks to catch up with the existing data. Still, if this process runs without disrupting the site, then it is the preferred mechanism.

Offline strategy

If your database has a large amount of stale data and the deletes disrupt the operation of the site, then you might need to rely on a one-time offline deletion. Having no traffic to the site allows the deletes more CPU and IO resources and also eliminates lock contention issues that impact both the shoppers and the clean up process.

If you are using an offline strategy, there are options other than deletes statements that you can use for data cleanups, such as exporting the rows that need to be preserved and using the loader utility to perform load with the replace option. Other options include temporarily disabling foreign keys and disabling logging. Keep in mind that these are dangerous alternatives and that they require careful planning.

Another drawback of the offline strategy is that it might be difficult to predict how much time is required for the deletion to complete if you do not have an environment with similar characteristics to production to validate the process.


Tools for implementing deletes

In this section, you will learn about the different tools that are available to implement data cleanup.

WebSphere Commerce cleanup utility (DBClean)

WebSphere Commerce includes a cleanup utility called DBClean that removes stale content from the database. The advantage of this tool is that it provides a predefined set of configurations that cleans up common WebSphere Commerce objects, such as CACHEIVL, STAGLOG, USERS, and ORDERS.

In addition to the existing objects, you can configure the DBClean utility to add new objects or modify the existing ones.

Custom cleanup programs

Implementing a custom program for the deletion is also possible and relatively straightforward. This is a good option if you need more flexibility to integrate the cleanup with other maintenance scripts that are executed in the database.

Using Java and JDBC

Java™ with JDBC is often the language of preference for implementing cleanup programs. The following example shows how to define a cursor to delete from a table:

Connection con = getConnection();
con.setAutoCommit(false);

int commit = COMMIT_COUNT;

Statement stmt = con.createStatement( ResultSet.TYPE_FORWARD_ONLY,
                                      ResultSet.CONCUR_UPDATABLE );
ResultSet rs   = stmt.executeQuery(  stmt, "SELECT 1 FROM table1
                                            FETCH FIRST 10000 ROWS ONLY
                                            FOR UPDATE" );

String currentOfSQL  = "DELETE FROM table1 WHERE CURRENT OF "
                       + rs.getCursorName();

PreparedStatement currentOfstmt = con.prepareStatement( currentOfSQL );

while(rs.next())  {
   currentOfstmt.executeUpdate();
   commit--;

   if ( commit == 0 ) {
      con.commit();
      commit = COMMIT_COUNT;
   }
}

Notice that in the above example, we used the WHERE CURRENT OF clause to delete the row from the table instead of using the deleteRow() API. This is because of a DB2 limitation where a null pointer exception can occur if the resultset contains only constants.

Using database stored procedures

Stores procedures are a popular option with database administrators (DBAs). They are fast and in contrast with DBClean or other Java utilities, you can run them from the database server machine or any other machine from where you can open a connection to the database.

Although DB2 supports multiple languages for implementing stored procedures, the two most popular options are Java and SQL.

SQL stored procedures

The following example shows how to implement a simple stored procedure to delete from a table:

-- db2 -td@ -vf cleanup_table1.sql
CREATE PROCEDURE cleanup_table1
BEGIN

   DECLARE v_commit          INTEGER DEFAULT 1000;
   DECLARE v_rcount          INTEGER;
   DECLARE v_max_num_deletes INTEGER DEFAULT 100;
   DECLARE v_counter         INTEGER DEFAULT 0;

   DECLARE v_sqltext         VARCHAR(1000);
   DECLARE v_deletestmt      STATEMENT;

   SET v_rcount  =  v_commit;

   SET v_sqltext = 'DELETE FROM ( SELECT 1 FROM table1 FETCH FIRST ' || 
     CHAR(v_commit) || ' ROWS ONLY ) WITH UR';
   PREPARE v_deletestmt FROM v_sqltext;

   WHILE ( v_counter < v_max_num_deletes AND v_rcount = v_commit )
   DO

      EXECUTE v_deletestmt;

      GET DIAGNOSTICS v_rcount = ROW_COUNT;
      COMMIT;

      SET v_counter = v_counter + 1;

   END WHILE;

END
@

Java stored procedures

If you are implementing a solution where you need more flexibility, you can consider using Java to implement the stored procedure. Java stored procedures are similar to regular Java programs that use JDBC to access the database.

The following example shows Java code that creates a stored procedure. Note the use of jdbc:default:connection when creating the connection.

package sample;

public class SampleCleanUp {

   static final int COMMIT_VALUE    = 1000;
   static final int MAX_NUM_DELETES = 5;
   static final String SQL = "DELETE FROM ( SELECT 1 FROM table
                                            FETCH FIRST 1000 ROWS ONLY 
                                          )
                              WITH UR";

   public static void cleanTable1 () throws Exception {
      Connection con = DriverManager.getConnection("jdbc:default:connection");
      Statement stmt = con.prepareStatement( SQL );
      for ( int i=0; i < MAX_NUM_DELETES; i++ ) {
         int lastExecute = stmt.executeUpdate();
         con.commit();
         if ( lastExecute < COMMIT_VALUE ) break;
      }

   }
}

To execute the Java code as a regular stored procedure, you need to first load the class that contains it into the server's classpath:

db2 call sqlj.install_jar('file:c:\temp\cleanup.jar', 'DB2INST1.CLEANUPJAR',0)
db2 call sqlj.refresh_classes

Once the code is available in the server, you must define the stored procedure and specify which Java class it is mapped to:

CREATE PROCEDURE table1CleanUp ( )
	NOT DETERMINISTIC
	LANGUAGE Java
	EXTERNAL NAME 'DB2INST1.CLEANUPJAR:sample.SampleCleanUp.cleanTable1'
	FENCED
	THREADSAFE
	PARAMETER STYLE JAVA;

If you plan on creating Java stored procedures, we recommend that you download the DB2 9 Developer Workbench or use the DB2 Development Center. These tools make it easier to define and deploy stored procedures and functions.

Although there are many options for implementing the deletes, the decision on a particular tool depends on maintainability and flexibility with other processes and business requirements.


Techniques for deletes

To allow the deletions of thousands of records in an online environment, you need to use a technique that sets a commit frequency. The option to set a commit frequency is important because it allows you to release locks on tables and free resources from the buffer pools. Note that using various commit frequencies can improve the performance of the delete significantly, but it can also have an adverse affect. Determining the optimal commit frequency requires testing through trial and error as each environment uses a different value.

There are two main techniques that allow you to implement a commit frequency: deleting from a subselect or fullselect and deleting using cursors.

Deleting from a subselect or fullselect

The number of records affected by a DELETE statement are limited by using the FETCH FIRST {number} ROWS ONLY clause. The DELETE statement does not support the FETCH FIRST {number} ROWS ONLY clause directly, but you can implement it by using a subselect or fullselect.

The following statement uses an IN predicate and a subselect to implement the FETCH FIRST {number} ROWS ONLY clause. The results of the IN predicate are compared to the primary key to ensure the rows returned are deleted.

DELETE FROM table 
 WHERE pk IN ( SELECT pk
                 FROM table
                WHERE ... 
                FETCH FIRST 1000 ROWS ONLY)

Starting with version 8.1 Fix Pack 4, DB2 now supports deletions from a fullselect. In contrast with the subselect, this technique allows you to delete directly over the result set:

DELETE FROM ( SELECT 1 
                FROM table 
               WHERE ... 
               FETCH FIRST 1000 ROWS ONLY )

Deleting from cursors

Cursors allow you to traverse and process the results of a query. When a transaction is committed, the database manager closes all the cursors except those declared WITH HOLD. By defining a cursor WITH HOLD, you can commit the transaction and continue processing the results of the cursor.

Using the FOR UPDATE clause

You can define cursors as "read only" or "for update". Using "for update" signals to the database manager that you intend to update the results of the cursor.

When "for update" is specified, the connection acquires update (U) locks over the result set. This prevents other connections from updating the rows in the result set, but not from reading them, which will prevent deadlocks during deletion. Also, when "for update" is specified, you will find that the query tends to table scan. This is because DB2 does not use certain indexes to prevent what is known as the "Halloween problem", where an update moves a key further down the index that leads to a row being processed more than once during an index scan.

Ambiguous cursors, those which do not specify "for update" or "for read only", are treated as read only if the BLOCKING ALL option is specified when precompiling or binding. Otherwise, the cursor is considered updatable.

If you attempt a delete over a cursor that is not updatable, DB2 throws the following error:

SQL0510N UPDATE or DELETE is not allowed against the specified cursor.

Performance considerations between cursors and deletes

Cursors and deletes perform differently. Cursors have the advantage in that the delete statement is only executed once, but for each row that is fetched, the program has to instruct the database manager to delete (delete where current of), which can be expensive.

On the other hand, using a delete requires the delete statement to be executed multiple times, depending on the value of commit parameter.

Num_Executions = ceil( Total_Rows / Commit )

Figure 2 compares the two techniques and illustrates the activities involved with a delete.

Figure 2. Activity diagram for deletes vs. cursors
Activity diagram for deletes vs. cursors

The performance of the two techniques varies depending on how tuned the statement is and the commit frequency used. Lower commit frequencies tend to impact delete statements much more than it does to cursors, especially when the statement is not tuned. Still, the performance of cursors is also affected if the commits are performed too frequently because a commit causes a synchronous log write, which results in more I/O.

Tuning the commit frequency is explained in more detail later. In our experience, the delete technique outperforms the use of cursors if the SQL is relatively tuned and a reasonable commit frequency is used (1,000 or more).


Customizing deletes for business requirements

In database tuning, it is hard to find a solution that fits all. The out-of-the-box delete statements are generic statements intended to cover all of the WebSphere Commerce business models and configurations. Although most of these pre-defined delete configurations provided with the DBClean utility may work well in your environment, there is room for optimization by tuning the delete to meet specific business requirements. By tailoring the statements to match your requirements, you can find opportunities for improving performance, such as eliminating controls that are not relevant to your environment.

Next, you are presented with two of the most used DBClean configurations, ORDERS and USERS, and learn how you can adapt them to a particular scenario.

Out-of-the-box ORDERS and MEMBER deletes

The following are the delete statements as they are executed by DBClean when the default options are used. Notice that the statements are not the same as the ones in the CLEANCONF table. This is because the tool needs to alter them to include the FETCH FIRST n ROWS ONLY clause to support committing after an n number of records.

The out-of-the-box deletes used in this example were taken from a WebSphere Commerce V6.0.0.2 environment. If the deletes you find are different, refer to this technote, DBClean for Guest Users does not perform well when there are many Guest Users in the database.

Orders delete

This query is used for removing orders. It checks for the following conditions:

  • The order has a status of P, I, W or N.
  • It was placed N number of days ago.
  • It was placed by a guest (G) user.
  • There is no inventory allocated for any of the items in the order.
  • There are no returns for any of the items in the order.
Object: order
Type: stale_guest
DELETE FROM orders
 WHERE orders_id IN ( 
   SELECT orders_id
     FROM orders
    WHERE (status = 'P' OR status = 'I' 
     	 OR status = 'W' OR status = 'N')
      AND (DAYS(CURRENT TIMESTAMP) - DAYS(lastupdate)) >= ?
      AND orders.member_id IN 
      	( SELECT DISTINCT users_id
             FROM users
            WHERE registertype = 'G')
      AND orders_id NOT IN 
      	( SELECT DISTINCT orders_id
             FROM orderitems
            WHERE inventorystatus != 'NALC'
              AND inventorystatus IS NOT NULL)
      AND orders_id NOT IN 
      	( SELECT orders_id
             FROM orderitems
            WHERE orderitems_id
              IN ( SELECT orderitems_id
                     FROM rmaitem
                    WHERE orderitems_id IS NOT NULL))
    FETCH FIRST ${COMMIT} ROWS ONLY )

Member delete

The following query is used to delete guest users. It checks for the following conditions:

  • The member is a guest (G) user.
  • It accessed the site N number of days ago.
  • Users do not have any orders other than type Q.
Object: users
Type: guest
DELETE FROM member
 WHERE member_id IN ( 
   SELECT member_id
     FROM member
    WHERE member_id IN (
      SELECT users_id
        FROM users t1
       WHERE registertype='G'
         AND (DAYS(CURRENT TIMESTAMP) - DAYS(prevlastsession)) >= ?
         AND NOT EXISTS (
            SELECT 1
              FROM orders
             WHERE orders.member_id = t1.users_id
               AND status != 'Q')
AND(users_id > 0))
 FETCH FIRST ${COMMIT} ROWS ONLY )

Customizing ORDERS and MEMBER deletes

Let's assume that after evaluating the out-of-the-box statements and analyzing the business logic, you realize that you can customize the deletes to fit more closely to your business requirements.

Keep in mind that the new statements are provided as a customization example only and they might not suit your particular business requirements.

Orders delete

The following changes were made to the out-of-the-box statement for orders to better suit the requirements of a particular business:

  • You have decided that this delete applies to both registered and guest shoppers so the guest shopper condition was removed.
  • Your store does not use quotations or approvals, so the new delete statement only checks for pending orders (P).
  • Your store does not use the ATP inventory model so the inventory condition was excluded.
  • The store does not ship items on pending orders, so the rmaitem sub-query was also removed. (Remember that in some configuration, the order and the items might have different statuses).

The new simplified version of the out-of-the-box order delete statement now looks as follows:

Object: order
Type: stale
DELETE FROM (
  SELECT 1
    FROM orders
   WHERE status ='P'
     AND CURRENT TIMESTAMP - ? DAYS >= lastupdate 
   FETCH FIRST {number} ROWS ONLY )
    WITH UR;

Member delete

You will also customize the out-of-the-box delete statement. As your store does not use the quick-order profile (order type Q), the statement was simplified to remove users who do not have any orders.

You will also assume that the guest users will not have any orders since the above order delete statement was first executed to remove stale pending orders. However, if your system is creating thousands of guest users with no associated orders, review the implementation to validate the need of using non-generic commands as explained in this technote, Thousands of guest users created without associated orders.

We will also assume that all the guest shoppers have a PREVLASTSESSION value in the USERS table. If a user does not have a PREVLASTSESSION value set, you cannot delete it. The technote, User records cannot be cleaned using DBClean, contains more information about how to identify and resolve this problem.

The new simplified version of the out-of-the-box member delete statement now looks as follows:

Object: users
Type: guest
DELETE FROM (
  SELECT 1
    FROM member m
   WHERE m.member_id > 0 
     AND EXISTS ( SELECT 'x'
                    FROM users u 
	             WHERE u.users_id  = m.member_id 
	             AND u.registertype = 'G' 
	             AND u.prevlastsession < ( CURRENT TIMESTAMP - ? DAYS ) ) 
     AND NOT EXISTS ( SELECT 1
                        FROM orders o 
                       WHERE o.member_id = m.member_id )
   FETCH FIRST {number} ROWS ONLY )
    WITH UR;

Tuning delete statements

In this section, you will learn multiple techniques that improve the performance of delete statements.

While trying to improve the performance of the deletes, keep in mind that your database needs to be tuned for the store traffic and not for the cleanup operation. It is tempting to do some configuration changes based on the testing, but keep in mind how these changes can impact the normal operation of the site. This is true when you are considering new indexes.

This section discusses the following topics:

Using explains plans

When evaluating a delete performance, you need to explain the statements to understand the access path to the data that DB2 is using. You can obtain explains by using the db2exfmt command or the DB2 Command Center.

The following example shows how to create an explain plan using the db2exfmt command. For this example to work, you need to have previously created the explain tables. If you notice that the explain is taking a long time to complete, refer to the technote db2exfmt takes very long time to complete for a solution or work around for this issue.

db2 connect to mall
db2 set current explain mode explain
db2 -tf delete.sql
db2 set current explain mode no
db2exfmt -d mall -g TIC -n % -s % -w -1 -# 0 
    -o delete.explained -u db2inst1 password

Note that in the same environment you may notice that even though one explain plan shows a smaller cost than another, this may not always be reflected at execution time.

For example, let's analyze the following delete statement:

DELETE FROM ( 
     SELECT 1 FROM table WHERE created <= CURRENT TIMESTAMP - ? DAYS )

When DB2 explains the delete, it estimates how many rows will be affected by the where condition. The problem is that at prepare time, there are a number of unknowns that prevent the optimizer from making a good estimate. In the above example, these are:

  • Parameter markers: Because the number of days is not known at prepare time, DB2 cannot use the distribution statistics.
  • CURRENT TIMESTAMP special register: The time is not evaluated at prepare time and thus the current timestamp is also an unknown.

Given these unknowns, DB2 will not know if the query affects a large percentage of the table, or just a few rows. If DB2 estimates that it will only delete a few rows, but the resultset to be deleted is large, then this might result in a bad plan and poor performance.

The following explain plan snippet for the query above illustrates how DB2 makes a poor estimate on the number of rows that will be returned. When the delete is executed, the where condition filters out almost all the rows from the table and only 0.2% of the table cardinality is returned.

A simple plan like this one will not influence the filter factor the way that DB2 resolves the query, but it might be a factor when dealing with more complex queries, such as those for orders and users.

4) Sargable Predicate
   Relational Operator:       Less Than or Equal (<=)
   Subquery Input Required:   No
   Filter Factor:          0.00207089

   Predicate Text:
   --------------
   (Q1.CREATED <= -(CURRENT TIMESTAMP, :?, 3))

If you believe that the unknown variables are causing DB2 to create a poor plan, remove the unknowns and compare the performance. You can either update the statement or use optimization profiles to set the REOPT ONCE option. The REOPT ONCE option delays the compilation of the statements until the values are set for the first time.

The following example shows how to define an optimization profile for a delete statement to delay the compilation until the unknown variables are set. If you decide to use this approach, you can use the db2pd tool with the -reopt option to list the values with which the statement was compiled.

<?xml version="1.0" encoding="UTF-8"?>
<OPTPROFILE VERSION="8.2.2">
   <STMTPROFILE ID="FORCING REOPT IN DELETE">
      <STMTKEY>
         <![CDATA[
         DELETE FROM (
            SELECT 1
              FROM orders
             WHERE CURRENT TIMESTAMP - ? DAYS >= lastupdate
               AND status ='P'
             FETCH FIRST 100 ROWS ONLY)
              WITH UR
         ]]>
      </STMTKEY>
      <OPTGUIDELINES>
         <REOPT VALUE="ONCE"/>
      </OPTGUIDELINES>
   </STMTPROFILE>
</OPTPROFILE>

Even if a plan shows a smaller cost, always benchmark the new and old SQLs to ensure the change is reflected in the execution time. Also keep in mind that the explains you get in your test environment might not the same as what you will get in production, especially if the environment or the database does not have the same characteristics. The developerWorks article, Recreate optimizer access plans using db2look, provides good insight on how to deal with this situation.

Tuning the commit frequency

Using different commit frequencies can have a significant impact, not only in the performance and resources needed by the cleanup operation (such as IO and CPU), but also in the number of locks that are acquired and the contention that this can create with shoppers accessing the site.

When deleting thousands of records, it is a best practice to subdivide the statements into multiple units of work or commit points. If thousands of records are deleted in the same transaction, the database is affected by a number of issues:

  • The transaction can exhaust the locklist, which will lead to lock escalations and prevent other connections from using the tables. This can effectively lock up the whole site and prevent people from browsing and shopping.
  • The deletes might be locking key pieces of data that are used by the store front. Other connections might timeout waiting for the locks to be released.
  • The transaction can fill up the DB2 logs, which will make the deletion fail with this similar message:
    SQL0964C: The transaction log for the database is full.  SQLSTATE=57011

When implementing a commit frequency, find a number that is optimal for your environment. The impact will be different depending on the delete technique used.

If the deletions are done using a delete statement, lowering the frequency means the delete needs to be executed more times, which is not efficient, especially if the access plan includes expensive scan operations. Also, the commit operation requires synchronous I/O to the logs, which is costly and affects deletions done with both cursors or deletes.

If you start with a low commit frequency (such as 1 to 100), increasing the frequency can improve performance several times, especially with delete statements. Still, keep in mind that if the commits are too infrequent, your system might experience the resource and lock contention issues listed above, such as lock escalations, which can interrupt the operation of the site.

When choosing a commit frequency, keep in mind the kind of data you are deleting. In general, you can use a much larger commit frequency on tables that do not have children tables, such as STAGLOG or CACHEIVL, than in tables where the delete cascades to other tables, such as CATENTRY, ORDERS or MEMBER.

If you are using the WebSphere Commerce cleanup utility (DBClean) and the -commit parameter is not specified, a value of 1,000 will be used. This is a safe starting point. In our testing, we have successfully implemented with deletions using commits from 5,000 to 25,000, depending on the table.

Evaluating subselects and fullselects

To set a commit frequency, DELETES are usually re-written using a subselect or a fullselect and the FETCH FIRST X ROWS ONLY clause as was previously discussed in Deleting from a subselect or fullselect.

When the DB2 optimizer prepares the statement, it creates an equivalent statement, which is optimized. For this reason, you often find that, regardless of the technique used, subselect or fullselect, the optimized version will be the same. This results in the same performance when the statements are executed.

However, there are circumstances where the optimized statements are not the same, where the fullselect technique outperforms the subselect (this is not true in all cases).

As an example, in one environment we found that the following SQL performed much better when the fullselect was used. This was also evident in the explain plan.

DELETE FROM staglog 
 WHERE stgrfnbr IN ( SELECT stgrfnbr 
                       FROM staglog 
                      WHERE stgprocessed = 1 
                        AND stgstmp < CURRENT TIMESTAMP - ? DAYS 
                      FETCH FIRST 10000 ROWS ONLY ) 
 WITH UR

Figure 3 shows the access plan for the delete statement above. This delete uses the standard approach and you can see that the IN is resolved using a temporary table.

Figure 3. Explain plan for STAGLOG using subselect
Explain plan for STAGLOG using subselect

The delete statement was then rewritten to use a full select without the IN clause:

DELETE FROM ( SELECT 1 
                FROM staglog 
               WHERE stgprocessed = 1 
                  AND stgstmp < CURRENT TIMESTAMP - ? DAYS 
                FETCH FIRST 10000 ROWS ONLY ) 
   WITH UR

Comparing the explain plan from Figure 3 to that of Figure 4, we can see that the plan has been simplified to only use an index scan.

Figure 4. Explain plan for STAGLOG using fullselect
Explain plan for STAGLOG using fullselect

Keep in mind that deleting from a fullselect is not the default behavior when using the DBClean utility. This option was added with APAR IZ22196. You can refer to the database cleanup utitity topic in the Information Center or to the technote, Implementing deletions with DBClean using fullselects, for details about how to enable it.

Evaluating IN and EXISTS predicates

IN and EXISTS are similar and most of the time the statements are written to use either predicate. The same as with fullselects and subselects, the DB2 query optimizer re-writes the query to what it considers is the most efficient approach. The optimizer creates an optimized statement that can use either IN, EXISTS, or JOINS.

If you suspect that DB2 is not resolving an IN or EXIST predicate in an efficient way, you can use the explain plans tool and compare performance of both alternatives.

Semantic differences of NOT EXISTS and NOT IN

Although in most cases an IN predicate is translated to an EXIST predicate, be aware of the semantic differences when the results are negated (NOT), which can lead to disparity in the result set.

Consider the following tables:

Table A col1 Table B
col1
1 1
NULL 2

The first SQL uses a NOT IN predicate to list all the rows in Table B, which do not have corresponding rows in Table A.

db2 => SELECT * FROM b WHERE b.col1 NOT IN (SELECT col1 FROM a)

COL1
--------------------

  0 record(s) selected.

The second SQL uses a NOT EXIST predicate instead. As you can see by the number of records selected, both SQLs yield different results.

db2 => SELECT * FROM b WHERE NOT EXISTS (SELECT 1 FROM a WHERE a.col1 = b.col1)

COL1
--------------------
                   2

  1 record(s) selected.

The difference lies in the way that NULL values are handled by the NOT IN predicate:

  • The result of the predicate is "true" if the fullselect returns no values or if the specified relationship is true for every value returned by the fullselect.
  • The result is "false" if the specified relationship is false for at least one value returned by the fullselect.
  • The result is "unknown" if the specified relationship is not false for any values returned by the fullselect and at least one comparison is unknown because of the null value.

In this case, the existence of NULL values in the fullselect makes the SQL return an unknown result and you find that no rows are selected.

To ensure the NOT IN clause gives the desired result, you can eliminate the NULL values from the fullselect as follows:

db2 => SELECT * FROM b WHERE b.col1 NOT IN (SELECT col1 FROM a WHERE col1 IS NOT NULL)

COL1
--------------------
                   2

  1 record(s) selected.

Comparing Explain plans for NOT EXISTS and NOT IN

The following custom query uses the NOT IN clause. Note that as the column supports NULL, we need to check for nulls in the where clause. Otherwise, the result is empty.

DELETE FROM (
   SELECT 1 
     FROM ordsource 
    WHERE ordsource_id NOT IN ( SELECT sourceid FROM orders WHERE sourceid IS NOT NULL ) 
    FETCH FIRST 10000 ROWS ONLY )
    WITH UR

Figure 5 shows the explain plan that corresponds to the previous query. The explain plan shows that to satisfy the NOT IN clause, DB2 creates a temporary table with the matching IDs.

Figure 5. Explain plan for ORDSOURCE using NOT IN
Explain plan for ORDSOURCE using NOT IN

The following SQL is a re-written version that uses a NOT EXIST clause instead:

DELETE FROM ( 
    SELECT 1 
      FROM ordsource 
     WHERE NOT EXISTS ( SELECT 1 
                          FROM orders 
                         WHERE sourceid = ordsource_id ) 
      FETCH FIRST 10000 ROWS ONLY ) 
      WITH UR

As you see in the explain plan in Figure 6, when NOT EXISTS is used, DB2 does an index scan and it does not need to create the temporary table. In our testing for this particular query, the NOT EXIST version performed better than the original statement, which used the NOT IN predicate.

Figure 6. Explain plan for ORDSOURCE using NOT EXISTS
Explain plan for ORDSOURCE using NOT EXISTS

Ensuring predicates are supported by indexes

When tuning the deletes, ensure that the statements are supported by the necessary indexes. Indexes do not only make the SQL run faster, but they also reduce the locking that is created during the deletion.

Always tune the site for live traffic and not for deletions. If you need to make DDL changes to support a delete, evaluate how this is going to impact the live site in terms of performance, locking, or storage requirements.

Finding the right indexes might not be straightforward. You need to analyze how all the queries and deletes being implemented will access the data. This eliminates redundancies where each delete ends up defining its own index. For example, you might want to create an index that includes the PREVLASTSESSION column of the USERS table, and optionally include the REGISTERTYPE column on it.

When an index includes multiple columns, the order in which they are specified will depend on how the data is accessed and its distribution.

Consider the following SQL:

DELETE FROM (
  SELECT 1
    FROM member m
   WHERE m.member_id > 0 
     AND EXISTS ( SELECT 'x'
                    FROM users u 
	             WHERE u.users_id  = m.member_id 
	               AND u.registertype = 'G' 
	               AND u.prevlastsession < (CURRENT TIMESTAMP - 15 DAYS) ) 
     AND NOT EXISTS ( SELECT 1
                        FROM orders o 
                       WHERE o.member_id = m.member_id )
   FETCH FIRST {number} ROWS ONLY )
    WITH UR;

When joining multiple tables, consider including the joined column in the index. For the SQL above, we would create an index in the USERS table and include the USERS_ID, and PREVLASTSESSION columns. Specifying the USERS_ID column first in the index eliminates a SORT, which might need to be done before the table can be joined to the orders table by users_id. You can verify this with an explain plan.

Avoiding expressions over columns in local predicates

Applying expressions over columns in local predicates might prevent the optimizer from using an index and also requires extra processing at execution time.

For example, the following expression is applying a function, DAYS, over the PREVLASTSESSION column, which is indexed.

DAYS(CURRENT TIMESTAMP) - DAYS(prevlastsession)) >= ?

You can rewrite the previous expression as follows with similar results:

CURRENT TIMESTAMP - ? DAYS >= prevlastsession

The difference between the two approaches is that the first one counts full days, while the second one deducts 24-hour periods from the current time. For example, if you are cleaning data for the previous day, if the row had a timestamp of 6:00 pm and the cleanup is run at 6:00 am, the first delete will include the row, but the second one will not.

Avoiding expressions over columns in local predicates is always recommended. If the expression includes unknowns, as outlined in the previous section, Using explains, the results can vary. For example, using the same query, if the DAYS function is used in this query, DB2 estimated that the Less Than or Equal To condition will return one-third of the table.

Sargable Predicate
Relational Operator:       Less Than or Equal (<=)
Subquery Input Required:   No
Filter Factor:             0.333333

Predicate Text:
--------------
(:? <= (DAYS(CURRENT TIMESTAMP) - DAYS(Q5.LASTUPDATE)))

On the other hand, if the DAYS function is omitted, the estimation is completely different. In this case, DB2 assumes the where clause filters out almost all rows, and estimates that it returns a small result set.

Sargable Predicate
Relational Operator:       Less Than or Equal (<=)
Subquery Input Required:   No
Filter Factor:             0.00207089

Predicate Text:
--------------
(Q5.LASTUPDATE <= -(CURRENT TIMESTAMP, :?, 3))



    14496.2       <-- Estimated rows return is 14,496
    IXSCAN            7000000 * 0.00207089 is aprox 14,496
    (  10)        <-- operation number
    41683.9       <-- cost
    18154.5       <-- IO cost
      |
    7000000       <-- Table has 7,000,000 rows
INDEX: DB2INST1
 ORDERS_CIDX1

The difference in the estimation can lead DB2 to use a plan than is not efficient. If that is the case, removing the functions over the local predicates can hurt performance. If you are running into a situation like this, we recommend that you eliminate the unknowns, both the DAYS function and CURRENT TIMESTAMP clause, and compare the performance.

Using uncommitted read

If the delete statements are executed with uncommitted read (WITH UR), DB2 does not acquire read (next-key) locks while scanning for the rows that match the criteria in the subquery. This helps minimize the impact on a live site.

Deleting by partition

When deleting from a database for the first time, there may be millions of records to clean up. To help improve the performance of deletes on the first cleanup, you can use the following techniques to manage the amount of data to be processed on each execution of the delete statement.

Fetch first X rows only clause

This technique is used for implementing a commit frequency with deletes. Although the database manager makes optimizations when it knows the result set is restricted, most of the time the optimizer is only able to apply this filter late in the retrieval of the data. For example, it happens after SORTs and JOINs have been completed. For this reason, do not expect good performance because the statement only affects a small number of rows.

Time and range boxing

This technique refers to reducing the number of rows in the result set by specifying a range. For example, you can include a condition in the delete statement to limit the number of users.

DELETE FROM ( 
   SELECT 1
     FROM member
    WHERE member_id IN (
      SELECT users_id
        FROM users t1
       WHERE registertype='G'
         AND ( users_id > 20000000  AND users_id <= 21000000 )
         AND NOT EXISTS ( SELECT 1 FROM orders t2 WHERE t2.member_id = t1.users_id))
      FETCH FIRST 1000 ROWS ONLY )

This also gives you a chance to do additional optimizations. If you know that all the guest users in that range have a prevlastsession, which is older than a specific reference date, you can remove the date check from the query.

For time-boxing to be effective, you need to analyze the explain plans and ensure the database manager does the filtering early in the execution of the query.

Using marked for delete

You can perform complex deletions in multiple steps. One technique consists of extracting the select part of the delete and then executing it first. You can then use the data returned in the select to create a temporary table or as an update statement to flag the data as marked-for-delete.

If the delete only looks for the marked-for-delete rows, the statement is much simpler and the fetch first x rows only clause effectively limits the numbers of rows that are processed. This technique also has the benefit of making the deletes faster, reducing the number of locks acquired, and releasing the locks more quickly.

Deleting child rows first

As all the foreign keys in WebSphere Commerce use cascade deletes, you can either delete from the parent table and let the delete cascade to all the child tables, or you can selectively delete from the child tables first.

Most of the time, you will be applying this technique when the data retention rules are shorter for the child tables. For example, remove orders months before the user itself is deleted.

This technique can give you mixed results. In general, it is better to do all the operations in one step because it gives the database more opportunities to optimize the execution. Remember that DB2 will always check for data in the child tables, regardless of whether or not you cleaned those tables first.


On the other hand, the advantage of this technique, as with the marked-for-delete flag, is that it allows you to write simpler statements. Instead of having to check for complex rules, you can simply query to ensure that the object has no dependent data in a certain table. For example:

DELETE FROM ( 
   SELECT 1
     FROM member
    WHERE member_id IN (
      SELECT users_id
        FROM users t1
       WHERE registertype='G'
         AND ( users_id > 20000000  AND users_id <= 21000000 )
AND NOT EXISTS ( SELECT 1 FROM orders t2 WHERE t2.member_id = t1.users_id))
      FETCH FIRST 1000 ROWS ONLY )

Testing and monitoring deletes

There are two aspects of testing: functional and non-functional. The first one consists of ensuring that only the data that you intended to delete was removed. This kind of testing is always validated with the business users. The second kind of testing is about the non-functional requirements, which is to ensure that the site remains operable while the cleanup is executed, and that the cleanup completes inside the maintenance window that was allocated for it.

Validating the functional requirements

It is important to ensure that the cleanup operation only removes the intended data. Business users need to validate the delete operations and the parameters used, for example, number of days. Given that WebSphere Commerce uses foreign keys with the cascade delete option, it can be tricky to find all the data that was removed with a delete. The following techniques will help you list the tables where the data was removed.

Determining all affected tables

WebSphere Commerce does an extensive use of foreign keys and cascade deletes. If you are not aware of the extent of the relationships involved with a delete, you can inadvertently delete data that you need to preserve.

For example, if you look at the ORDERITEMS table, you will find that it references the OFFER table, in addition to many others. If the offer is deleted, all the rows in the ORDERITEMS table that have a reference to it will be deleted as well.

You can use the following SQL to identify all the tables to which a delete can cascade to. To use it, change the value of the tabname to the table you are interested in:

WITH tmpsql ( tabschema, tabname, tabtree, level ) AS (
   SELECT tabschema, tabname, CAST( tabname AS VARCHAR(4096)) tabtree, 0 level
     FROM syscat.tables
    WHERE tabschema = (SELECT MIN(tabschema) FROM syscat.tables WHERE tabname = 'MEMBER')
      AND tabname   = 'OFFER'
    UNION ALL
   SELECT r.tabschema, r.tabname tabname, CAST ( t.tabtree || ':' || r.tabname || 
     '(' || r.deleterule || ')' AS VARCHAR(4096))
          tabtree, t.level + 1 level
     FROM tmpsql            t,
          syscat.references r
    WHERE r.reftabschema =  t.tabschema
      AND r.reftabname   =  t.tabname
      AND r.reftabname  <>  r.tabname
      AND LOCATE( ':' || r.tabname || '(' , t.tabtree )= 0 
) SELECT DISTINCT CAST( tabtree AS VARCHAR(256)) tabtree
    FROM tmpsql t
   WHERE level <= 3 -- level allows you set the depth of the tree you want to see 
    ORDER BY 1 WITH UR;

If the query above is executed in an OOTB schema, the output is similar to this:

OFFER
OFFER:DKPDCOFFERREL(C)
OFFER:OFFERDESC(C)
OFFER:OFFERPRICE(C)
OFFER:ORDERITEMS(C)
OFFER:ORDERITEMS(C):BKORDALLOC(C)
OFFER:ORDERITEMS(C):BKORDITEM(C)
OFFER:ORDERITEMS(C):CPITMAP(C)
OFFER:ORDERITEMS(C):OICOMPLIST(C)
OFFER:ORDERITEMS(C):OICOMPLIST(C):ORDITEMCONF(C)
OFFER:ORDERITEMS(C):OICOMPREL(C)
OFFER:ORDERITEMS(C):OICOMPREL(C):OICOMPLIST(C)
OFFER:ORDERITEMS(C):ORDIADJUST(C)
OFFER:ORDERITEMS(C):ORDICALCD(C)
OFFER:ORDERITEMS(C):ORDIMEEXTN(C)
OFFER:ORDERITEMS(C):ORDIOFFER(C)
OFFER:ORDERITEMS(C):ORDIPROF(C)
OFFER:ORDERITEMS(C):ORDISTAT(C)
OFFER:ORDERITEMS(C):ORDITAX(C)
OFFER:ORDERITEMS(C):ORDITEMCONF(C)
OFFER:ORDERITEMS(C):ORDITPROMOCD(C)
OFFER:ORDERITEMS(C):ORDITRD(C)
OFFER:ORDERITEMS(C):ORDPICKHST(C)
OFFER:ORDERITEMS(C):ORDSHIPHST(C)
OFFER:ORDERITEMS(C):PATTRVALUE(C)
OFFER:ORDERITEMS(C):PATTRVALUE(C):RFQRSPPARL(C)
OFFER:ORDERITEMS(C):RABACKALLO(C)
OFFER:ORDERITEMS(C):RMAITEM(C)
OFFER:ORDERITEMS(C):RMAITEM(C):RMACHARGE(C)
OFFER:ORDERITEMS(C):RMAITEM(C):RMAIADJCRD(C)
OFFER:ORDERITEMS(C):RMAITEM(C):RMAIDNYRSN(C)
OFFER:ORDERITEMS(C):RMAITEM(C):RMAITEMCMP(C)
OFFER:ORDERITEMS(C):RMAITEM(C):RMAITEMSERIAL(C)
OFFER:ORDERITEMS(C):TCPITMAP(C)
OFFER:ORDERITEMS(C):TRDDEPAMT(C)
OFFER:ORDERITEMS(C):TRDPURAMT(C)
OFFER:ORDIOFFER(C)

Determining the number of rows deleted

There are a number of techniques to determine the number of rows deleted, but these require that you are the only user in the system.

Counting rows in all tables

  1. Create a script named prepareCount.sql with the following content:
    SELECT 'select CAST( ''' || tabschema || '.' || tabname || ''' as VARCHAR(40)),
                   COUNT(*) total from ' || tabschema || '.' || tabname || ';' 
     FROM syscat.tables 
     WHERE tabschema = (SELECT MAX(tabschema) 
                          FROM syscat.tables 
                         WHERE tabname = 'MEMBER') 
     GROUP BY tabschema, tabname 
     ORDER BY 1 asc 
    WITH UR;
  2. Before and after the delete runs, execute the script and save the results. Then use the diff command to compare differences. For example:
    db2 connect to mall
    db2 -xtf prepqreCount.sql > count.sql
    db2 -xtf count.sql > count_before.txt
    db2 -tvf delete.sql
    db2 -xtf count.sql > count_after.txt
    diff count_before.txt count_after.txt

    This produces an output similar to the following. For example, you can see that one row was removed from the member table and one from the users table.
    795c795
    < DB2INST1.MEMBER                                  278
    ---
    > DB2INST1.MEMBER                                  277
    1479c1479
    < DB2INST1.USERS                                    84
    ---
    > DB2INST1.USERS                                    83

Using snapshots for tables

You can also use snapshots for tables to find the number of rows deleted.

  1. Reset the monitor switches.
    db2 reset monitor all
  2. Execute the delete.
  3. Obtain a table snapshot.
    db2 get snapshot for tables on mall
  4. The snapshot includes all the tables that were accessed. Look at the "Rows Written" counter to find the number of deletes:
    Table Schema        = DB2INST1
    Table Name          = ORDERS
    Table Type          = User
    Data Object Pages   = 769705
    Index Object Pages  = 213116
    Rows Read           = 0
    Rows Written        = 3332
    Overflows           = 0
    Page Reorgs         = 0

Validating non-functional requirements

Although database cleanup is done during off-peak hours, traffic will still flow to the site while the deletes are executing. The purpose of this testing is to ensure the site remains operational while the deletes execute.

To validate non-functional requirements, run the cleanup scripts in a stress environment while simulating the load that the live site is expected to receive at the time the deletes are executed. For example, run them daily at midnight.

Also, it is important that you use a database that is equivalent to production, not only in its hardware characteristics, but also in the size of the database. This is to ensure that you get similar benchmark results when the statements are executed in the production environment.

The cleanup operations will add some stress on the system that you can see in the form of:

  • High CPU on the database server
  • Decrease in overall site response times
  • High number of database lock waits and lock contention (time-outs, lock-waits)

The following tools and techniques are helpful in validating the delete performance.

DB2 benchmark tool

DB2 includes a benchmark tool called db2batch to measure performance of the SQLs from the command line. Besides the timing and CPU information, you can configure the tool with different levels of detail. The example below, which uses level 3, returns a summary of the monitor information.

$ db2batch -d dbclean -f p-orders.sql -c off -o p 3

Running in Embedded Dynamic mode.

---------------------------------------------

Statement number: 1

DELETE FROM orders WHERE orders_id IN ( SELECT orders_id FROM orders o 
WHERE o.status = 'P' AND o.lastupdate < ( CURRENT TIMESTAMP
 - 30 DAYS ) AND o.member_id IN ( SELECT users_id  FROM users WHERE 
registertype = 'G'  AND users_id > 0 ) FETCH FIRST 5000 ROWS ONLY )


Elapsed Time is:           178.652    seconds

Locks held currently                         = 10029
Lock escalations                             = 0
Total sorts                                  = 3
Total sort time (ms)                         = 41622
Sort overflows                               = 0
Buffer pool data logical reads               = 965901
Buffer pool data physical reads              = 950726
Buffer pool data writes                      = 12176
Buffer pool index logical reads              = 741964
Buffer pool index physical reads             = 14238
Buffer pool index writes                     = 6496
Total buffer pool read time (ms)             = 905732
Total buffer pool write time (ms)            = 109205
Asynchronous pool data page reads            = 871744
Asynchronous pool data page writes           = 12010
Asynchronous pool index page reads           = 3032
Asynchronous pool index page writes          = 6496
Total elapsed asynchronous read time         = 806991
Total elapsed asynchronous write time        = 107541
Asynchronous read requests                   = 27394
LSN Gap cleaner triggers                     = 0
Dirty page steal cleaner triggers            = 0
Dirty page threshold cleaner triggers        = 0
Direct reads                                 = 978
Direct writes                                = 0
Direct read requests                         = 238
Direct write requests                        = 0
Direct read elapsed time (ms)                = 1236
Direct write elapsed time (ms)               = 0
Rows selected                                = 7
Log pages read                               = 0
Log pages written                            = 2990
Catalog cache lookups                        = 242
Catalog cache inserts                        = 80
Buffer pool data pages copied to ext storage    = 0
Buffer pool index pages copied to ext storage   = 0
Buffer pool data pages copied from ext storage  = 0
Buffer pool index pages copied from ext storage = 0
Total Agent CPU Time (seconds)               = 15.9622
Post threshold sorts                         = 0
Piped sorts requested                        = 3
Piped sorts accepted                         = 3
---------------------------------------------

Summary of Results
==================
                Elapsed             Agent CPU         Rows      Rows
Statement #     Time (s)            Time (s)          Fetched   Printed
1                   178.652              15.962         0         0

Arith. mean     178.652             16
Geom.  mean     178.652             16

Snapshots and statement event monitors

Snapshots and statements event monitors are also great tools that help you to understand how an SQL statement is executing and to find the amount of resources the statement is consuming. Snapshots allow you to capture information about the database and the connected applications for a specific point time. Event monitors, on the other hand, are used to collect information over a period of time.

Finding the application ID

Instead of running snapshots or event monitors against the whole database, you can enable the monitors for the connection that is executing the deletes only.

If the deletes are executed from the active connection, you can issue the application_id() function to print the application ID:

select application_id() as appl_id from sysibm.sysdummy1

If you do not know the application ID, you can use the list applications show detail command. The output will include the client application and machine and that can help you deduct which connection is executing the deletes. If this does not help, you can invoke the get snapshot for applications command. The report will include the last statement being executed by each application.

Snapshots

Snapshots can provide visibility into how the delete statement is performing. When taking snapshots, specify the application ID that you are interested in to avoid the overhead of collecting data for the whole database. For example:

db2 get snapshot for application applid '*LOCAL.DB2.080715032837'

Note that when using "snapshot for applications", the snapshot needs to be taken while the script is executing.

You can also query the snapshot information using SQL statements by using the SNAPSHOT_STATEMENT and SNAPSHOT_APPL_INFO functions as shown below:

SELECT ai.agent_id, ai.appl_id, ai.appl_name, ai.client_nname, 
       st.stmt_operation, st.stmt_text
  FROM (SELECT agent_id, stmt_operation, stmt_text 
          FROM TABLE(SYSPROC.SNAPSHOT_STATEMENT('',-1)) s ) AS st,
       (SELECT agent_id, appl_id, appl_name, client_nname 
          FROM TABLE(SYSPROC.SNAPSHOT_APPL_INFO('',-1)) s ) AS ai
 WHERE st.agent_id = ai.agent_id

Statement event monitors

You can enable statement event monitors for one application in particular, as shown in this example:

db2 "create event monitor delmon for statements where (APPL_ID = 
 *LOCAL.DB2.080715032837')  write to file '/tmp/dbclean'"
db2 "set event monitor delmon STATE=1"
call_dbclean.sh
db2 "set event monitor delmon STATE=0"
db2evmon -db mall -evm delmon > /tmp/dbclean/delmon.out

The following example snippet shows the details for the execution, note the operation type is "Execute Immediate":

31) Statement Event ...
  Appl Handle: 1126
  Appl Id: *LOCAL.db2inst1.080715201821
  Appl Seq number: 0008
	
  Record is the result of a flush: FALSE
  -------------------------------------------
  Type     : Dynamic
  Operation: Execute Immediate
  Section  : 203
  Creator  : NULLID
  Package  : SQLC2E07
  Consistency Token  : AAAAAcEU
  Package Version ID  :
  Cursor   :
  Cursor was blocking: FALSE
  Text     : DELETE FROM orders WHERE orders_id IN 
( SELECT orders_id FROM orders WHERE (status = 'P' OR 
status = 'I' OR status = 'W' OR status = 'N') AND (DAYS(CU
RRENT TIMESTAMP) - DAYS(lastupdate)) >= 90 AND 
orders.member_id IN ( SELECT DISTINCT users_id FROM users 
WHERE registertype = 'G') AND orders_id NOT IN ( SELECT D
ISTINCT orders_id FROM orderitems WHERE inventorystatus 
!= 'NALC' AND inventorystatus IS NOT NULL) AND orders_id NOT IN 
( SELECT orders_id FROM orderitems WHERE orderitems_id IN 
( SELECT orderitems_id FROM rmaitem WHERE orderitems_id IS 
NOT NULL)) FETCH FIRST 1000 ROWS ONLY )
  -------------------------------------------
  Start Time: 07/15/2008 16:20:43.668257
  Stop Time:  07/15/2008 16:22:39.718519
  Elapsed Execution Time:  116.050262 seconds
  Number of Agents created: 1
  User CPU: 25.674601 seconds
  System CPU: 3.553437 seconds
  Fetch Count: 0
  Sorts: 2
  Total sort time: 2
  Sort overflows: 0
  Rows read: 10111543
  Rows written: 85849
  Internal rows deleted: 1000
  Internal rows updated: 0
  Internal rows inserted: 0
  Bufferpool data logical reads: 3522264
  Bufferpool data physical reads: 2180
  Bufferpool temporary data logical reads: 41217
  Bufferpool temporary data physical reads: 0
  Bufferpool index logical reads: 269850
  Bufferpool index physical reads: 120529
  Bufferpool temporary index logical reads: 0
  Bufferpool temporary index physical reads: 0
  SQLCA:
   sqlcode: 0
   sqlstate: 00000

Monitoring lock waits

If there is lock contention between the cleanup and the store, this can cause a severe decrease in the site's response times for all the operations that depend on the set of tables being cleaned.

To test for lock contention in a stress environment, start the cleanup process and drive the load to the store that is representative to what you expect at the time the deletes are executed.

While the test is running, look for connections that are in the lock wait state. Lock waits occur if the generated load creates locks that are not compatible with those acquired by the delete. To find the connections in lock-wait, you can use the list applications show detail command, or get snapshots.

The following example shows how to create a script to show lock waits using table snapshots:

  1. Ensure that DFT_MON_LOCK is enabled at the database manager level.
    db2 update dbm cfg using DFT_MON_LOCK ON
  2. Create the lockwait.sql script.
    SELECT CAST( TABLE_NAME AS VARCHAR(20)) TABNAME, 
       	COUNT(*) WAITERS 
      FROM TABLE(SYSPROC.SNAPSHOT_LOCKWAIT('', -1)) AS LW 
     GROUP BY CAST( TABLE_NAME AS VARCHAR(20)) 
     ORDER BY 2 DESC;
  3. Create lockwait.sh that invokes lockwait.sql.
    #!/bin/ksh
    db2 connect to mall
    while true;do
       print "["`date`"]" 
       db2 -tf lockwait.sql
       sleep 30
    done
  4. Run the script while the delete is executing. The following is an example output when deleting orderitems:
    [Fri Aug 8 11:15:00 EDT 2008]
    
    TABNAME              WAITERS
    -------------------- -----------
    RMAITEM                        1
    
      1 record(s) selected.

If you find that the cleanup script is creating lock contention, use DB2 tools, such as snapshots, for locks and investigate the root cause of the locking. In the next section, we will explore techniques for avoiding lock contention.


Preventing locking

Locking is the main concern during online cleanups. If there are locking problems, the site will be unusable while the deletes are executing. This section describes common locking problems and techniques to avoid them.

Lock escalations

Lock escalation is the process of replacing row locks with table locks to reduce the number of locks in the list. This can have a severe impact in concurrency. Escalations occur when the locklist runs out of space or when a connection reaches the maximum number of locks (maxlocks).

If read locks are promoted, the connection is granted a shared table lock. If the connection is holding write locks, then an exclusive lock escalation occurs. This is the most restrictive scenario and prevents other connections from either reading or writing to the table.

When testing the cleanup scripts, ensure that no escalations occur. You can do this by monitoring the db2diag.log file or taking snapshots of the database.

If the deletes are causing lock escalations, you can start by decreasing the commit count. All the locks held by a connection are released when the transaction is committed, thus by committing more often, the connection will hold less locks at any point in time.

Also, you can try increasing the locklist or the maxlocks parameters. A small maxlocks parameter is appropriate for the SQLs executed from the front end store, but it might not be sufficient for the batch delete operations. Keep in mind that if you allow a single connection to grab a large percentage of the locklist, this can exhaust the locklist that leads to lock escalations.

Locks on cascade deletes

When deleting from a table that is referenced by referential integrity (RI) rules, DB2 needs to scan the child tables to locate the rows that reference the data being deleted. If the scan is not done using indexes, DB2 locks the child tables that cause contention problems.

DB2 has recently released APAR IZ16942, "Optimizer not choosing correct index during referential integrity checking", that provides several enhancements for cascade delete scenarios.

When the APAR is enabled by setting the DB2_REDUCED_OPTIMIZATION=JULIE registry variable, the indexes supporting foreign keys are used even if the optimizer detects a different plan with a cheaper cost that does not use the indexes.

This APAR also reduces locking when the index contains columns in addition to the ones referenced by the foreign key. For example, if the foreign key is on the orders_id column, but the index contains orders_id and orderitems_id, before the APAR, DB2 needed to use a repeatable read (RR) scan on the index. This requirement is not longer needed after the APAR is enabled.

As WebSphere Commerce makes extensive use of referential integrity, it is common to find that the data cleanup operation locks the site because of the cascade delete locking. Installing APAR IZ16942 can help you prevent these problems.

Testing for referential integrity locking

You only need two connections to test for locking. For example, find two orders that you can delete and start two consoles with auto commit off. If the second connection locks against the first one, then you might need to resolve the RI locking issue.

Console A Console B
db2 +c db2 +c
connect to rmall connect to rmall
DELETE FROM orders WHERE orders_id = 10000001 DELETE FROM orders WHERE orders_id = 10000002

Lock contention with other applications

Some locking is inevitable. The delete SQLs might lock while scanning or deleting rows, which might be read and written to by other connections. In this situation, increasing the time that the connection waits for a lock might help.

DBClean currently uses the type 2 legacy driver. You can increase the time out that DBClean uses by creating a new database alias and setting the LockTimeOut CLI configuration keyword. Alternatively, you can execute the SET CURRENT LOCK TIMEOUT statement from your tool.

DB2 registry variables that help with locking

The following DB2 registry variables are popular among WebSphere Commerce customers and help reduce lock contention issues. More information about these variables is available in the DB2 8 Information Center.

  • DB2_EVALUNCOMMITTED=ON: Allows table or index access scans to defer or avoid row locking until a data record is known to satisfy a predicate evaluation.
  • DB2_SKIPINSERTED=ON: Allows statements using either Cursor Stability or Read Stability to skip uncommitted inserted rows as if they had not been inserted.
  • DB2_SKIPDELETED=ON: Allows statements using either Cursor Stability or Read Stability to unconditionally skip deleted keys during index access and deleted rows during table access.
  • DB2_REDUCED_OPTIMIZATION=JULIE: As described earlier, this registry setting helps minimize the locking during cascade deletes. It is available by installing APAR IZ16942.

Conclusion

This article presented different tools and techniques to improve the performance of online data cleanups and to minimize the impact to the store. Implementing best practices for SQL tuning, tailoring the statements to your specific requirements, and ensuring the database is properly tuned can help you implement cleanup scripts that do not disrupt the live site.

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere, Information Management
ArticleID=332250
ArticleTitle=Techniques for improving online data cleanup in a WebSphere Commerce DB2 database
publish-date=11202008