Distributed DBA: Adding and removing data partitions

An effective way to migrate large blocks of data

Roger E. Sanders explains why, where, when, and how to partition tables with DB2 for Linux, UNIX, and Windows—and shows how enhancements in DB2 9.7 make the process easier and faster.

This article was originally published in IBM Data magazine.

Share:

Roger Sanders, Senior Consultant Corporate Systems Engineer, EMC Corporation

Roger Sanders a consultant corporate systems engineer at EMC Corporation, is the author of 19 books on DB2 for Linux, UNIX, and Windows and a recipient of the 2010 IBM Information Champion award. He is currently working on a new book titled From Idea to Print: How to Write a Technical Article or Book and Get It Published.



10 October 2010

Also available in Chinese

Special thanks to Mike Winer, senior technical staff member–DB2 kernel architect, and Liping Zhang, DB2 data partitioning development lead, for providing information that was used to develop this article.

In my last column, I explained how you can use table partitioning to organize data across multiple storage objects based on values in one or more columns. I also described two ways to use the PARTITION BY clause of the CREATE TABLE statement to create partitioned tables.

One advantage to using partitioned tables is that only relevant data partitions are accessed during query execution; because the DB2 Optimizer is data-partition aware, only relevant data partitions are scanned to resolve a query, resulting in fewer I/Os and higher query performance. Another advantage is that new partitions can easily be added to increase the table range or existing tables containing data can easily be attached (rolled in), while partitions containing old or obsolete data can be removed (rolled out) and archived, for example, to meet company policies or federal laws that mandate record retention.

In this column, I'll walk you through the process of adding and removing data partitions, and I'll show you how enhancements made in DB2 9.7 can make the process easier and faster.

Adding new partitions to a partitioned table

There are two ways to add new partitions to a partitioned table: you can add new ranges and increase the capacity of a table by adding one or more empty partitions, or you can add existing, populated tables by converting them into new partitions. As you might imagine, the process used depends on which approach you choose.

A new, empty partition can be added to a partitioned table by executing the ALTER TABLE statement with the ADD PARTITION option specified; the syntax used looks like this:

    ALTER TABLE [TableName] 
      ADD PARTITION <PartitionName>
      <STARTING <FROM> [Start | MINVALUE | MAXVALUE] |
       STARTING <FROM> ([Start | MINVALUE | MAXVALUE] ,...)
      <INCLUSIVE | EXCLUSIVE>>
      ENDING <AT> [End | MINVALUE | MAXVALUE] |
       ENDING <AT> ([End | MINVALUE | MAXVALUE] ,...) 
      <INCLUSIVE | EXCLUSIVE>
      <IN [TSName]>
      <INDEX IN [IndexTSName]>
      =<LONG IN [LongTSName]>

where:

  • TableName: Identifies the partitioned table, by name, that a new partition is to be added to
  • PartitionName: Identifies the unique name, if any, to be assigned to the partition to be added
  • Start: Specifies the low end of the range for the data partition
  • End: Specifies the high end of the range for the data partition
  • TSName: Identifies the table space in which the new partition is to be stored
  • IndexTSName: Identifies the table space in which any partitioned indexes for the new partition are to be stored
  • LongTSName: Identifies the table space in which the values of any long columns are to be stored

Note: Parameters shown in angle brackets (< >) are optional; parameters or options shown in normal brackets ([ ]) are required and must be provided. A comma followed by ellipses (...) indicates that the preceding parameter can be repeated multiple times.

Thus, if you wanted to add an empty partition to a partitioned table named SALES, you could do so by executing an ALTER TABLE statement similar to this:

    ALTER TABLE sales ADD PARTITION q4_sales
      STARTING '10/1/2010' ENDING '12/31/2010' IN tbsp3;

When such a statement is executed, a new table is created and logically made a part of the partitioned table. The system catalog table SYSCAT.DATAPARTITIONS is then updated to reflect the change.


Attaching a populated table to a partitioned table

Tables that already contain data can be added to a partitioned table as new partitions by executing the ALTER TABLE statement with the ATTACH PARTITION option specified. The syntax used for this type of operation looks like this:

    ALTER TABLE [TableName] 
      ATTACH PARTITION <PartitionName>
      <STARTING <FROM> [Start | MINVALUE | MAXVALUE] |
       STARTING <FROM> ([Start | MINVALUE | MAXVALUE] ,...)
      <INCLUSIVE | EXCLUSIVE>>
      ENDING <AT> [End | MINVALUE | MAXVALUE] |
       ENDING <AT> ([End | MINVALUE | MAXVALUE] ,...) 
      <INCLUSIVE | EXCLUSIVE>
    FROM [SourceTable]
    <BUILD MISSING INDEXES | REQUIRE MATCHING INDEXES>

where TableName, PartitionName, Start, and End are the same as before and SourceTable identifies the base table that is to be used as the source of data for the new partition.

So, if you wanted to attach a base table named Q4_2010_SALES to an existing partitioned table named SALES as a new partition, you could do so by executing an ALTER TABLE statement that looks like this:

    ALTER TABLE sales ATTACH PARTITION q4_sales
      STARTING '10/1/2010' ENDING '12/31/2010'
      FROM q4_2010_sales;

After a base table is successfully attached to a partitioned table, the system catalog is updated to reflect the newly attached partition while information about the original base table is removed from the catalog. No data movement is involved. This means that the process of attaching a new partition takes place very quickly—typically within a few seconds. (It's important to note that to attach an existing base table to a partitioned table, the base table's design must be similar to that of the partitioned table; see sidebar, "Conditions for attaching a table as a new partition.")

Anytime a base table is attached to a partitioned table, the partitioned table is automatically placed in "Set Integrity Pending" state. This means that integrity checking must be performed immediately after a new partition is attached. To check integrity after adding a base table named Q4_2010_SALES to an existing partitioned table named SALES (as a new partition), you would need to execute a SET INTEGRITY statement that looks similar to this:

    SET INTEGRITY FOR sales 
      ALLOW WRITE ACCESS
      IMMEDIATE CHECKED;

Conditions for attaching a table as a new partition

Before an existing base table can be added (attached) to a partitioned table, the following conditions must be met:

  • The source table must be a nonpartitioned table or a partitioned table that has only one partition and no attached or detached partitions.
  • The source table cannot be a typed table or a range-clustered table.
  • The source table must be droppable.
  • The source and target table definitions must match:
    • The number of columns must be the same.
    • The data types of each column must be the same. (For columns with XML or LOB data types, INLINE LENGTH values must be the same.)
    • The nullability characteristics of each column must be the same.
    • Any default constraints defined for columns must be the same.
  • Source and target table compression specifications must match.
  • Characteristics (table space type, page size, extent size) of the table spaces used by the source and target tables must match.
  • If the target table is distributed across database partitions, the source table must also be distributed using the same distribution method.
  • If the target table is a multidimensional clustering (MDC) table, the source table must also be an MDC table.
  • XML data formats used by the source and target tables must be the same.

Removing partitions

Just as you can add or attach new partitions to a partitioned table, you can also remove existing partitions. (Removed partitions become regular, stand-alone base tables.) Partitions can be removed by executing the ALTER TABLE statement with the DETACH PARTITION option specified; the syntax for this statement looks like this:

    ALTER TABLE [TableName] 
      DETACH PARTITION <PartitionName>
      INTO [TargetTable]

where:

  • TableName: Identifies the partitioned table, by name, that a partition is to be removed from
  • PartitionName: Identifies the partition, by name, that is to be removed
  • TargetTable: Identifies the base table where the partition's data is to be stored

Thus, if you wanted to remove a partition named Q1_SALES from a partitioned table named SALES and store its data in a table named Q1_2010_SALES, you could do so by executing an ALTER TABLE statement that looks like this:

    ALTER TABLE sales 
      DETACH PARTITION q1_sales 
      INTO q1_2010_sales;

Once the table Q1_2010_SALES becomes available, you can either drop it (to delete the data) or attach it to another partitioned table (for example, to archive it).


DB2 9.7 table partitioning enhancements

Prior to DB2 9.7, indexes for partitioned tables were nonpartitioned. This meant that whenever partitions were added or removed, corresponding records had to be added to or removed from any indexes that existed for the table.

With DB2 9.7, partitioned indexes are now supported, allowing each data partition to be indexed separately. This provides several benefits, including improved performance when adding or removing partitions and the ability to perform select maintenance operations at the partition level. Starting with FixPack1, you can perform REORG operations on a single data partition rather than on the entire partitioned table, resulting in faster REORG operations while increasing availability to other data partitions.

Other table partitioning enhancements introduced with DB2 9.7 include XML support (available with FixPack1 or greater) and higher availability during ALTER TABLE ... DETACH PARTITION operations—the removal of a partition no longer requires that all current transactions to the table be completed, allowing long-running queries to continue without impeding the partition removal process.


Partitioning for performance

Data partitioning provides many useful benefits, particularly when used in data warehouse and decision support environments. By taking advantage of these features, you can manage data more effectively, while delivering optimum database performance.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=550026
ArticleTitle=Distributed DBA: Adding and removing data partitions
publish-date=10102010