Online roll-out with table partitioning in InfoSphere Warehouse

Improve the availability of your warehouse data

Starting with IBM® DB2® for Linux®, UNIX®, and Windows® V9.7.1, the table partitioning feature is enhanced with support for online roll-out using detach. With online roll-out, queries continue to access the partitioned table while one or more partitions of the table are being detached using the ALTER TABLE DETACH PARTITION command. Learn how online detach improves the availability of the warehouse and discover some best practices for leveraging and monitoring the new behavior. A stored procedure is provided to help database administrators to script the post-detach processing on the target table. The article also demonstrates how the same procedure can be used to simulate synchronous detach behavior in certain application scenarios. We conclude with frequently asked questions related to detach.

Naresh Chainani (naresh@us.ibm.com), Software Developer, IBM

Author Photo: Naresh ChainaniNaresh Chainani is a software developer and has worked in DB2 for Linux, UNIX, and Windows development in IBM Beaverton for the past 10 years. His expertise spans several areas of DB2 with a particular emphasis on table partitioning and decimal floating-point. Recently, he led the development of several enhancements to the table partitioning feature in DB2. Prior to that, he led the development of the IEEE-compliant DECFLOAT data type. He frequently interacts with customers either to present table partitioning or provide consulting assistance related to table partitioning, as well as to seek valuable stakeholder feedback.



Liping Zhang (liping@us.ibm.com), Senior Software Engineer, IBM

Liping Zhang photoLiping Zhang is a Senior Software Engineer based at the IBM Beaverton Lab. He has been working on the table partitioning in DB2 for Linux, UNIX, and Windows since version 9.1 and has led the feature design since version 9.7. He has also been working with customers to help them understand and adopt the table partitioning feature. Liping has authored IDUG presentations and frequently contributed to DB2 papers, articles and books on table partitioning.



Steve Pearson (stevpear@us.ibm.com), Senior Software Engineer, IBM

Steve PearsonSteve Pearson is a senior software engineer based at the IBM Beaverton lab. He has worked on DB2 for Linux, UNIX, and Windows for nine years, primarily leading work on the High Availability Disaster Recovery (HADR) feature. Over the past year, he has worked on the DB2 table partitioning feature.



Srinivasa Santhanam (srinivasa.s.raghavan@gmail.com), Software Developer, Consultant

Srinivasa SanthanamSrinivasa Santhanam is a software developer who worked on DB2 for Linux, UNIX, and Windows for three years in the Beaverton lab, primarily on the table partitioning feature.



27 October 2011

Also available in Chinese

Introduction

DB2 supports efficient data roll-out with the table partitioning feature using ALTER TABLE DETACH PARTITION, which can separate a large chunk of data from a source table in the unit of partition.

Figure 1 shows roll-out when both global (nonpartitioned) indexes and local (partitioned) indexes exist on the source table. The detached source table partition becomes a separate nonpartitioned table. For local indexes, the index object associated with the detached partition is carried along and the indexes within become indexes of the new target table. Global indexes, on the other hand, are not inherited by the target table. Instead, the index entries for the detached partitions are orphaned by the detach command. DB2 then automatically runs a separate background task called Asynchronous Index Cleanup (AIC) to remove orphaned keys from the global index. Meanwhile, such keys are ignored by queries that access the source table.

Figure 1. Roll-out from a table with global and local indexes
Image shows detach of one partition from a table with a global and a local index

Prior to V9.7.1, roll-out with table partitioning is an offline operation. Detach waits for all existing access to the partitioned table to complete, while blocking new access from starting. However, detach may need to be performed in the same timeframe as long-running reporting queries, potentially creating contention between administrative operations performed by the DBA and query access to the table as part of the application workload.

Starting with DB2 V9.7.1, the DETACH operation is made online for dynamic SQL by allowing a partition to be detached while queries are concurrently accessing the remaining partitions of the table. In fact, a reporting workload using the Uncommitted Read (UR) isolation level could be reading the partition being detached, and even that would not stop the detach from proceeding concurrently.

This is accomplished by making DETACH a two-phase operation. The first phase, performed by the ALTER TABLE DETACH PARTITION command, logically detaches the partition from the table. The second phase, managed by the system in the background without the need for user intervention, completes the detach asynchronously after ensuring that all other existing activities that access the partitioned table have completed. The background process is referred to as the Asynchronous Partition Detach (APD) process. When there are no dependent materialized query tables (MQTs), the system initiates the second phase after the first commits. If dependent tables, such as staging tables or refresh immediate MQTs, exist on the partitioned table, the second phase starts automatically after these dependent tables are refreshed via SET INTEGRITY.

From a DBA perspective, the roll-out operation progresses in the presence of a long-running reporting workload. From an application perspective, the long-running reporting workload is not interrupted by detach and completes concurrently. For any access to the partitioned table that happens after the successful completion of the DETACH command, the detached partition is no longer visible, and the asynchronous completion of detach will not interrupt this new access. Of course, the new online roll-out behavior respects the isolation level in effect. For instance, if a Repeatable Read (RR) scanner has read the partition, but not committed yet, detach will wait for the RR scanner to complete before it can make the partition invisible. After the asynchronous second phase of the detach operation has completed, the target table of detach is available to be archived or dropped or attached to another table.

The asynchronous detach behavior is new and is the only choice in DB2 V9.7.1. It can be monitored using the following methods:

  • Issue the LIST UTILITIES SHOW DETAIL command.
  • Query the STATUS column of the SYSCAT.DATAPARTITIONS catalog view for the partition in question. See the Appendix for a stored procedure that implements this approach.

DETACH phases

The following table illustrates the sequence of activities associated with a detach operation.

Table 1. Sequence of DETACH-related activities
SequenceDescriptionExample
1 User-initiated, Phase 1 of DETACH
DETACH partition. Note that this command needs to commit in order for the second phase to start. In general, it is a good practice to commit immediately after any DDL.
ALTER TABLE purchaseOrders DETACH PARTITION jan2010 INTO obsolete_data;
COMMIT;
2 User-initiated
If the source table has any REFRESH IMMEDIATE MQTs defined, run SET INTEGRITY to refresh them.
SET INTEGRITY FOR q1_sales_mqt IMMEDIATE CHECKED;
COMMIT;
3 System-initiated, Phase 2 of DETACH
Asynchronous partition detach; no user intervention needed.
4 System-initiated
If there are any global indexes, asynchronous index cleanup (AIC) will be started by DB2 automatically to remove index entries for the detached partition.

Figure 2 illustrates the two-phase detach behavior and query activities allowed on the table during the entire period.

In phase 1, the partition is logically detached from the source table. Catalogs and other metadata are updated to reflect this, but the table is not physically removed from the source table. Dynamic queries using UR isolation may continue to access all partitions of the source table, including the detached partition, while dynamic queries at other isolation levels may access all remaining partitions of the source table. After detach commits, new queries will no longer see the detached partition.

Phase 2 begins after all queries that started before the detach have completed and any dependent MQTs have been refreshed by the user. In this phase, the detached partition is physically de-linked from the source table by the APD task and becomes its own separate nonpartitioned table.

Figure 2. The two-phase detach and query activities during the period
Image illustrates phases (synchronous logical and asynchronous physical) of detach

SQL waitForDetach procedure

While V9.7.1 improves the availability of the partitioned table during the roll-out operation, the target table of detach is not available until the APD for the detached partition completes. The sample SQL PL procedure (waitForDetach), found in the Appendix at the end of this article, provides a convenient way to script the post-detach processing for the target table, as described below. The procedure queries the partition state in the STATUS column of SYSCAT.DATAPARTITIONS catalog view and does not return until the APD for the partition has completed.

The interface to waitForDetach looks like the following.

Listing 1. waitForDetach
procedure waitForDetach (OUT returnCode integer,
                         OUT msg varchar(128),
                         IN schemaName varchar(128),
                         IN tableName varchar(128),
                         IN dataPartID int DEFAULT -1)

-- Output:
--   returnCode -  0 indicates detach completed successfully;
--                -1 indicates an error
--   msg        -  Error details

-- Input:
--   schemaName - Table schema
--   tableName  - Table name
--   dataPartID - (optional) dpid of partition to wait for;
--                by default, wait for all detach operations on the table
--                to complete.

The remainder of this section discusses scenarios where waitForDetach may be used.

Access to target table of detach

Before the target table can be accessed after a detach, the APD operation must complete; otherwise, error SQL20285N with reason code 2 will be encountered.

Scenario 1: Target table of detach will be dropped or archived

Since the target table of detach is not available until the detach operation is complete, one may wish to automate waiting for detach to complete before they drop or archive the target table of detach. The ALTER TABLE DETACH PARTITION command initiates the roll-out process. After that command commits, a subsequent call to waitForDetach will return once the detach operation has fully completed. Thus, it would wait for the completion of any existing queries on the range partitioned table at the time of the detach, and for the asynchronous partition detach phase. Then the target table of the detach can be dropped or archived.

Example: In the following example, the DBA for the warehouse detaches a partition every month, and the availability of the partitioned table is the most important to the warehouse. The target table of detach is archived and subsequently dropped.

Listing 2. DBA for warehouse detaches a partition
ALTER TABLE sales DETACH PARTITION jan2009 INTO jan2009_old;
COMMIT;

CALL waitForDetach (?, ?, 'DWSCHEMA', 'SALES');
BACKUP DB DWDB TABLESPACE (jan2009_tbsp) to /archpath;
DROP TABLE jan2009_old;
COMMIT;

Scenario 2: Target table of detach will be reattached

To perform certain operations at a partition level (for example, TRUNCATE), the DBA may detach a partition, perform the operation on the target table of detach, then reattach the partition. In this scenario, the TRUNCATE script will need to call waitForDetach for the detach to complete before it can invoke TRUNCATE on the target table of detach. The second phase of detach will wait for any existing queries as usual, though it is likely that this kind of maintenance occurs in a window where there are no such queries and the APD can complete quickly.

Example: In the following example, the DBA for the warehouse wishes to truncate a partition. Since the TRUNCATE command operates at a table level, the DBA achieves partition-level TRUNCATE by detaching the partition, truncating the target table, and reattaching the partition back to the original partitioned table. In this example, the DBA first queries the catalog to determine partition identifier for the partition to be truncated, then uses that in the call to waitForDetach to wait for just that specific partition. This use of waitForDetach could be advantageous in case other partitions were also recently detached.

Listing 3. DBA queries catalog to determine partition identifier for partition to be truncated
select DATAPARTITIONID from syscat.datapartitions where tabschema 
= 'DWSCHEMA' and tabname = 'CLAIMS' and DATAPARTITIONNAME = 'YEAR1';
-- assume partition id "3" was returned

ALTER TABLE claims DETACH PARTITION year1 INTO year1_table;
COMMIT;

-- wait for the above detach to complete
CALL waitForDetach (?, ?, 'DWSCHEMA', 'CLAIMS', 3);

TRUNCATE TABLE year1_table IMMEDIATE;

ALTER TABLE claims ATTACH PARTITION year1 STARTING FROM 200101 
ENDING AT 200112 FROM year1_table;

SET INTEGRITY FOR claims IMMEDIATE CHECKED;

Access to source table of detach

Scenario: DDL access to the range partitioned table

Any DDL (including subsequent ADD or ATTACH operations) that accesses a range partitioned table in a separate transaction immediately after a detach operation may potentially interfere with the completion of asynchronous detach, which in itself is akin to a DDL operation. In these cases, it may be beneficial to wait for the detach operation to fully complete before attempting the next DDL on the range partitioned table.

Example: In the following example, the DBA for the warehouse detaches an old partition and then adds a new empty partition to the same table.

Listing 4. DBA for warehouse detaches old partition and adds an empty partition
ALTER TABLE claims DETACH PARTITION year1 INTO year1_table;
COMMIT;

-- wait for the above detach to complete
CALL waitForDetach (?, ?, 'DWSCHEMA', 'CLAIMS');

ALTER TABLE claims ADD PARTITION year12 STARTING 201201 ENDING 201212;

Frequently asked questions

This section discusses some frequently asked questions related to using ALTER TABLE DETACH PARTITION to roll out data from the database.

Where did the partition and its indexes go after being detached?
Detach is a no-data-movement operation. The table data and local indexes of the partition being detached will not physically move but will continue to reside in the exact same table spaces. Only the identity information and the logical relationship with the source partitioned table will be modified, resulting in a few changes in the system catalog and other object metadata. Global indexes remain with the source table, and any keys associated with the detached partition will be removed from them by the AIC task.

Which table space will the indexes on the target table use?
The index table space specified for the partition when it was part of the source table will continue being used as the index table space for indexes on the target table. If no index table space was specified for that partition, the index table space will be the same as that for the partition data.

What indexes will be inherited by the target table?
All the partitioned (local) indexes defined on the source table will be inherited by the target table. These include those system generated indexes being used for primary or unique key constraints, for XML support, and for MDC table support. Among nonpartitioned (global) indexes, only MDC block indexes (which are nonpartitioned if the MDC table was created before V97 FP1) and XML column path indexes are inherited by the target table. No other nonpartitioned indexes will be inherited.

Will any constraints on the source table be inherited by the target table?
No. Constraints on the source table will not be inherited by the target table after DETACH. One should be able to add similar constraints back using the ALTER TABLE command when necessary. If any of the constraints were enforced with partitioned indexes before the detach, the ALTER TABLE command will reuse the inherited indexes and avoid spending time to create new ones.

Do I have to run SET INTEGRITY on the source or target after DETACH?
No, you do not need to run SET INTEGRITY on either the source table or the target table after DETACH. It's a common misunderstanding. When there are refresh immediate MQTs defined on the table, these MQTs will be in set integrity pending state after detach. You need to use SET INTEGRITY to refresh these MQTs and bring them to full access state.

How do I know if a partition has been detached successfully?
There are several options:

  1. Query the catalog view SYSCAT.DATAPARTITIONS to check if the partition is still part of the source table.
  2. Use LIST UTILITIES SHOW DETAIL to see if the asynchronous partition detach task activity has completed.
  3. Use the stored procedure code provided in this article, which is essentially based on the first method.

Can I detach multiple partitions from the same table in the same transaction?
Yes, but the best practice is to detach one partition at a time and commit right away, then wait for the APD task to complete before issuing the next DETACH. This is to avoid the potential conflicts caused by multiple APDs, one per DETACH, which may compete for system catalog locks and other resources. The result of such conflicts is that only one winner can proceed, where others will be put back on the waiting queue until next retry cycle, which, ultimately, takes longer than following the best practice.

Are the stats on the partition inherited after DETACH?
Some table and index statistics from the source table will be inherited by the target table during a detach operation, but not all. We recommend that users perform RUNSTATS for the source table after a detach operation. If the target table of a detach will be retained and queried in the future, we recommend running RUNSTATS for that table as well.

Are there any restrictions for DETACH when there are referential constraint relationships existing on the tables involved?
Yes. Detaching a partition from a parent table in the RI relationship is not allowed. One can drop the specific constraints or convert them from enforced to informational constraints to get around this restriction. However, caution must be exercised when performing such a workaround so that data integrity is not compromised.

Will delete triggers be fired during DETACH?
No.

Is there an option to force the old DETACH behavior (i.e., the completely synchronous detach behavior as in versions prior to DB2 V9.7 FP1)?
No. DB2 doesn't support such an option. However, one can use the stored procedure (waitForDetach) provided in this article or come up with one's own script to serialize DETACH and successive operations that involve the target table.

How soon can I reuse the partition name after the partition is detached?
The partition name is immediately available for reuse in the same unit of work following the return of the DETACH command, which will internally rename the partition to a system-generated name based on the current time (SQLyymmddhhmmssxxx). If the partition name needs to be reused by a SQL command from a different unit of work, that command will have to wait for the commit of the unit of work in which DETACH is issued.

I am using partitioned MQTs in my database. Can I detach a partition from the MQT?
Yes. The target table will be a regular table. The DETACH behavior will otherwise be the same as for a non-MQT table.


Summary

This article has described concurrency improvement for the ALTER TABLE DETACH PARTITION statement. As of DB2 V9.7.1, DETACH allows reporting queries to run concurrently with roll-out using table partitioning, thereby eliminating the need for downtime when business data retention policies are being implemented. We also discussed example scenarios in which asynchronous detach may need to be synchronized and provided example code for handling such synchronization.


Appendix: waitForDetach procedure

Below is example code for a stored procedure and a related helper function that can be used to determine when a DETACH operation has fully completed and the target table is accessible. Please note that this code is intended for illustrative purposes only.

In the following sample code, the CLP statement termination character is assumed to be %.

Listing 5. Sample code for waitForDetach procedure
-- Command to run as a script, assuming code is in a file named 'waitForDetach.clp'
-- would be: db2 -td% -vf waitForDetach.clp

-- Function to check if a table exists in the database.
--
-- Input:
--   schemaName - Table schema
--   tableName  - Table name
--
-- Returns:       1 if table is found, 0 if table not found
--
create function tableExists (schemaName varchar(128),
                             tableName varchar(128))
RETURNS INT
RETURN
  case
    when exists (
      select DATAPARTITIONID from syscat.datapartitions
        where tabschema = schemaName and tabname = tableName
        WITH UR
       )
          then 1
          else 0
END%



-- Procedure to wait for detach of a single partition or all
-- partitions to complete
--
-- Input:
--   schemaName - Table schema
--   tableName  - Table name
--   dataPartID - (optional) dpid of partition to wait for;
--                by default, wait for all detach operations on the table
--                to complete.
--
-- Output:
--   returnCode -  0 indicates detach completed successfully;
--                -1 indicates an error
--   msg        -  Error details
--
create procedure waitForDetach (OUT returnCode integer,
                                OUT msg varchar(128),
                                IN schemaName varchar(128),
                                IN tableName varchar(128),
                                IN dataPartID int DEFAULT -1)
  specific waitForDetach
  language SQL
BEGIN

  declare dpid int;
  declare dpstate char;
  declare done boolean default FALSE;
  declare tabNotFound boolean default FALSE;
  declare curClosed boolean default FALSE;
  declare oldLockTimeout INTEGER;

  declare allDetachCheck cursor for
      select DATAPARTITIONID, STATUS from syscat.datapartitions
        where tabschema = schemaName and tabname = tableName and
        (status = 'L' OR status = 'D') WITH CS;

  declare oneDetachCheck cursor for
      select DATAPARTITIONID, STATUS from syscat.datapartitions
        where tabschema = schemaName and tabname = tableName and
        DATAPARTITIONID = dataPartID WITH CS;

  -- Add signal handlers for errors that you consider non-fatal here
  -- example for no row found and lock timeout shown below
  declare continue handler for NOT FOUND
  BEGIN
    -- If no record was found there is nothing to wait for, so we are done
    set done = TRUE;
  END;

  declare continue handler for SQLSTATE '40001'
  BEGIN
    -- on a lock timeout we are not done, we want to loop again
    set done = FALSE;
    -- the cursor will be closed as part of the rollback during the lock timeout
    set curClosed = TRUE;
  END;

  -- initialize return code to 0
  set returnCode = 0;

  -- save current lock timeout value
  values current lock timeout into oldLockTimeout;
  set current lock timeout -1;

  -- if table does not exist in syscat.datapartitions, return error
  IF tableExists (schemaName, tableName) = 0
  THEN
    set msg = 'Table ' || schemaName || '.' || tableName || ' not found';
    set returnCode = -1;
    GOTO exit;
  END IF;

  WHILE done = FALSE DO
    set curClosed = FALSE;

    IF dataPartID <> -1
    THEN
      open oneDetachCheck;
      fetch oneDetachCheck into dpid, dpstate;

      -- two cases here:
      --  (i) detach has already completed hence partition entry not
      --      found in catalogs (indicated by done = TRUE, handled later)
      -- (ii) detach in progress, partition state should not be visible
      IF done <> TRUE AND (dpstate = '' OR dpstate = 'A')
      THEN
        set msg = 'Cannot waitForDetach if DETACH was not issued on ' ||
                  'DATAPARTITIONID ' || dataPartID;
        set returnCode = -1;
        GOTO exit;
      END IF;

      IF curClosed <> TRUE
      THEN
        close oneDetachCheck;
      END IF;
    ELSE
      open allDetachCheck;
      fetch allDetachCheck into dpid, dpstate;
      IF curClosed <> TRUE
      THEN
        close allDetachCheck;
      END IF;
    END IF;

  END WHILE;

  IF dataPartID <> -1
  THEN
    set msg = 'DETACH completed on DATAPARTITIONID ' || dataPartID;
  ELSE
    set msg = 'All detach operations completed on table ' ||
               schemaName || '.' || tableName;
  END IF;

exit:
  -- restore lock timeout value
  set current lock timeout oldLockTimeout;

  RETURN returnCode;

END%

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Big data and analytics
ArticleID=767819
ArticleTitle=Online roll-out with table partitioning in InfoSphere Warehouse
publish-date=10272011