Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Unleash the power of table partitioning in your DB2 warehouse

Naresh Chainani (naresh@us.ibm.com), Software Developer, IBM
Author Photo: Naresh Chainani
Naresh Chainani is a software developer and has worked in DB2 for Linux, UNIX, and, Windows development in IBM Beaverton for the last nine years. His expertise spans several areas of DB2 with a particular emphasis on table partitioning and decimal floating-point. Recently Naresh led the development of several enhancements to the table partitioning feature in DB2. Prior to that, he led the development of the IEEE-compliant DECFLOAT data type. He frequently interacts with customers either to present table partitioning or provide consulting assistance related to table partitioning, as well as to seek valuable stakeholder feedback
Liping Zhang (liping@us.ibm.com), Senior Software Engineer, IBM
Liping Zhang photo
Liping Zhang is a Senior Software Engineer based at the IBM Beaverton Lab. He has been working on the table partitioning in DB2 for Linux, UNIX, and Windows since version 9.1 and has led the feature design since version 9.7. He has also been working with customers to help them understand and adopt the table partitioning feature. Liping has authored IDUG presentations and frequently contributed to DB2 papers, articles and books on table partitioning.

Summary:  IBM® DB2® for Linux®, UNIX®, and Windows® V9.7 introduced several major table partitioning enhancements, including partitioned indexes (also called local indexes), integration with XML, online roll-out using DETACH, and support for data and index REORG at the partition level. This article discusses the applications of these features in a data warehouse environment and demonstrates how data roll-in using ATTACH can be orders-of-magnitude faster, while reducing the active log space consumed by leveraging partitioned indexes. The article also includes a section on some commonly asked questions related to table partitioning.

Date:  17 Jun 2010
Level:  Intermediate PDF:  A4 and Letter (73KB | 12 pages)Get Adobe® Reader®
Also available in:   Chinese  Korean

Activity:  6889 views
Comments:  

Introduction

Table Partitioning (sometimes referred to as data partitioning or range partitioning) was introduced in DB2 for Linux, UNIX, and Windows in Version 9.1. The initial release supported nonpartitioned indexes (also known as global indexes) that worked well for most access when combined with the power of partition elimination. However, when it came to common data administration operations such as data roll-in and data roll-out, the maintenance of global indexes took a long time. This not only delayed the availability of data that was rolled-in, it also consumed large amount of active log space.

DB2 V9.7 introduced local indexes to streamline the data roll-in operation. This article describes the best practices with data roll-in by leveraging local indexes that will help make the roll-in operation more efficient.

While new data is periodically rolled-in, old data need to be frequently rolled out for archival storage or simply deleted to free-up storage. With data warehouses being increasingly used as operational data stores, more business users today expect data in the warehouse to be available 24 x 7 during the roll-in and roll-out operations.

DB2 9.7.1 took a step towards improving the availability of the partitioned table by making roll-out (ALTER TABLE DETACH PARTITION) online for most common accesses. This article explains the new two-phase detach behavior and lists some frequently asked questions around it.

The size of the average data warehouse in businesses today is exploding. It is not uncommon to see 100+ TB warehouses! This data growth is driven by requirements ranging from business needs, to compliance with various legal and accounting standards. However, it is usually only recent data that changes most often, while historical data tends to be static. This leads to a requirement for performing maintenance on a subset of table data instead of the entire table in order to reduce operating time and avoid unnecessary operations on static data. DB2 9.7.1 introduced support for partition-level data and index reorganization by leveraging the partition independence provided by local index support in DB2 9.7.

Local indexes

Local indexes are partitioned in the same way as the table they are based on. Each index partition indexes rows only from the corresponding data partition. Most user-created and system-created indexes are created as local (or partitioned) by default.

A notable exception to this rule is unique indexes where the index key columns do not include the table partitioning key. These must be created as global indexes. In this case, the unique indexes created by the system for enforcing primary key or unique key constraints will be created as global. Without this restriction, maintaining uniqueness across multiple range partitions would be impractical. Note that when table partitioning is combined with database partitioning feature (DPF), unique index keys need to include the database partitioning (or distribution) key as well as the table partitioning key.

The index partition for each data partition can be in a different table space. That can be specified via the INDEX IN clause of the CREATE TABLE statement. The table partition metadata can be obtained by querying the SYSCAT.DATAPARTITIONS view, while the index partition metadata can be obtained by querying the SYSCAT.INDEXPARTITIONS catalog view that was introduced in DB2 9.7. The following example demonstrates the table space specification for local and global indexes on the table.


Example: Create a partitioned table and a partitioned index

CREATE TABLE purchaseOrders ( po_id INT, po_date DATE, po_customer CHAR(200), 
                              po_item VARCHAR(20),po_quantity INT, po_price DECFLOAT )
  INDEX IN global_index_tbsp
  PARTITION BY (po_date)
    (PARTITION jan2010 STARTING ('1/1/2010') ENDING '1/31/2010' IN jan2010_data_tbsp
                                        INDEX IN jan2010_index_tbsp,
     PARTITION feb2010 STARTING ('2/1/2010') ENDING '2/28/2010' IN feb2010_data_tbsp
                                        INDEX IN feb2010_index_tbsp, 
     PARTITION mar2010 STARTING ('3/1/2010') ENDING '3/31/2010' IN mar2010_data_tbsp
                                        INDEX IN mar2010_index_tbsp);

CREATE INDEX purchaseOrdersIndex1 ON purchaseOrders(po_customer) PARTITIONED;

Local indexes streamline data roll-in and roll-out. In case of roll-in, indexes on the table being attached are leveraged to become local indexes for the new partition (more on this in the next section). In case of roll-out, the local indexes on the partition being detached become indexes on the standalone table resulting from detach.

Roll-in with local indexes

Roll-in in DB2 using ATTACH is a two-step operation:

  1. First the ALTER TABLE ATTACH PARTITION command incorporates an existing table into a partitioned table as a new data partition.
  2. Second, the SET INTEGRITY command initiates integrity processing and maintains global indexes.

The newly attached data is visible only after the second step successfully completes. In the presence of global indexes, SET INTEGRITY requires a significant amount of active log space to incrementally maintain those indexes and also can require a long time to run, proportional to the amount of new data. Figure 1 illustrates the roll-in operation prior to DB2 9.7 using global indexes.


Figure 1. Roll-in with global indexes
Global index incurs heavy index maintenance overhead during roll-in

With local indexes, the time for roll-in is reduced by orders of magnitude. Any indexes on the table being attached that match the existing local indexes on the partitioned table can be utilized. The best practice to speed up data roll-in is to ensure that, prior to roll-in, the table being attached has indexes that exactly match all local indexes on the partitioned table. Figure 2 illustrates this best practice.


Figure 2. Roll-in with local indexes
Streamlined roll-in using local indexes

In general, even if this best practice is not followed, the time for roll-in is still considerably reduced when only local indexes exist. More specifically, if the table being attached does not have indexes that match the local indexes of the partitioned target table, the local index object for the new partition will be marked invalid and the local indexes for that partition will be built during SET INTEGRITY. Since this local index build happens only on the newly attached partition and requires very limited logging, SET INTEGRITY is still significantly faster compared to global indexes.

To summarize, Table 1 lists the best practice for roll-in:


Table 1. Best practice for roll-in
StepDescriptionExampleComments
1.Drop indexes from source table that do not match any local indexes on target tableDROP INDEX srcidx_po_priceQuick DDL
2.Create missing indexes on source table to match the local indexes on target tableCREATE INDEX srcidx_po_customer ON Apr2010(po_customer)Relatively inexpensive AND has no impact on target table
3.ATTACH partitionALTER TABLE purchaseOrders ATTACH PARTITION Apr2010 STARTING '4/1/2010' ENDING '4/30/2010' FROM Apr2010Quick DDL
4.SET INTEGRITYSET INTEGRITY FOR purchaseOrders IMMEDIATE CHECKEDSimple scan of new table data to check that all rows are within range and enforce constraints, if any exist.

Online roll-out

Starting in DB2 9.7.1, the DETACH operation is made online by allowing a partition to be detached while queries are concurrently accessing the remaining partitions of the table. In fact, a reporting workload using the Uncommitted Read (UR) isolation level could be reading the partition that is being detached, and even that would not stop the detach from proceeding concurrently. This is accomplished by making detach a two-phase operation, where the first phase logically detaches the partition and the second phase, managed by the system in the background without need for user intervention, completes the detach asynchronously after ensuring all old access to the partitioned table has completed. When there are no dependent materialized query tables (MQT), the system initiates the second phase after the first phase (ALTER TABLE DETACH PARTITION) commits. If dependents such as staging tables or refresh immediate MQTs exist on the partitioned table, the second phase starts automatically after these dependents are refreshed via SET INTEGRITY.

From a DBA perspective, the roll-out operation progresses in the presence of a long-running reporting workload. From an application perspective, the long-running reporting workload is not interrupted by detach and completes concurrently. For any access to the partitioned table that happens after the successful completion of the DETACH command, the detached partition is no longer visible and the asynchronous completion of detach will not interrupt this new access. Of course, the new online detach behavior respects the isolation level in effect. For instance, if a Repeatable Read (RR) scanner has read the partition but not committed yet, detach will wait for the RR scanner to complete before it can make the partition invisible. After the asynchronous detach operation has completed the detach, the target table of detach is available to be archived or dropped or attached to some (other) table.

The asynchronous detach behavior is new and is the only choice in DB2 9.7.1. It can be monitored using the following methods:

  • By issuing the LIST UTILITIES SHOW DETAIL command
  • By querying the STATUS column of the SYSCAT.DATAPARTITIONS catalog view for the partition in question. As long as the partition entry is still returned and the STATUS is 'D' or 'L', the asynchronous detach is not complete. The target table of detach becomes available only after the asynchronous detach completes.

The latter method can be particularly useful for automating the wait for completion of detach to help serialize the subsequent action (for instance, drop or archive) on the target table. In cases where the availability of the target table is more critical than the availability of the partitioned table, ensure that there is no concurrent access to the partitioned table to allow asynchronous detach to complete sooner.

Note that while DETACH is online starting DB2 9.7.1, the syntax remains unchanged as demonstrated in Table 2:


Table 2. Detach phases
PhaseDescriptionExample
1.DETACH partition. Note that this command needs to commit in order for the second phase to start. In general, it is a good practice to commit immediately after any DDL. ALTER TABLE purchaseOrders DETACH PARTITION jan2010 INTO obsolete_data;
COMMIT;
2.System-initiated asynchronous partition detach; no user intervention needed.

Partition-level REORG

DB2 9.7.1 introduced the capability to REORG either a single data partition or index partition using the ON DATA PARTITION clause, while allowing read/write access to remaining partitions of the table. The access mode specified in this case only applies to the partition being reorganized. By issuing several REORG commands for different partitions of the same table, multiple data partitions can be reorganized in parallel as long as:

  • The table does not have any global indexes besides XML column path indexes, and
  • ALLOW NO ACCESS mode is specified on the REORG command.

When global indexes are present and multiple partitions of the same table need to be reorganized, dropping any global indexes before the REORG operations and recreating them after all the REORG operations complete can be more efficient in some cases. Otherwise, not only does the data have to be reorganized one partition at a time serially, but also all global indexes must be rebuilt multiple times, once after each data partition is reorganized.

Here are a few things to keep in mind when using partition-level REORG:

  • Reorganizing multiple partitions concurrently requires sufficient system resources
  • Reorganizing a partition will put the entire table offline if the partitioned table has any global indexes besides XML column path indexes
  • The ON DATA PARTITION clause is not supported with the REORG INDEX command, which only applies to global indexes

The following examples illustrate partition-level REORG on a table that has no global indexes:


Table 3. Partition-level REORG with no global indexes
ExampleComments
REORG TABLE purchaseOrders ALLOW READ ACCESS ON DATA PARTITION Apr2010Reorganize a single partition (Apr2010) while allowing read access to it; all remaining partitions available for read/write.
REORG TABLE purchaseOrders ALLOW NO ACCESS ON DATA PARTITION Mar2010;
REORG TABLE purchaseOrders ALLOW NO ACCESS ON DATA PARTITION Apr2010;
Reorganize two partitions concurrently; no access is allowed to either partition; all remaining partitions available for read/write.
REORG INDEXES ALL FOR TABLE purchaseOrders ALLOW WRITE ACCESS ON DATA PARTITION Apr2010;Reorganize all local indexes for the Apr2010 data partition.

Integration with XML

The benefits of table partitioning extend to tables with XML data starting DB2 9.7. One XDA object is created per data partition to store XML data for that partition. The table space placement for XML data is determined by the LONG IN clause, which can be specified at partition level, table level, or any combination thereof. By default, the XML data goes in the same table space as the corresponding data partition.

Either global or local indexes can be created over XML data. However, since XML columns cannot be part of the table partitioning key, any unique XML values index will have to be created as a global index. The system-created XML regions index is always created as local, while the XML column path index is always created as global. The global path index results in improved storage efficiency and minimizes the path look-up time for XQuery execution.

I wonder why?

This section discusses some observations about table partitioning that might not be very obvious and some things that we frequently get questions on.

Q: Why can't I specify a table space for my local index using the CREATE INDEX statement?
A: All local indexes for a data partition reside in the same index object, just like indexes on a nonpartitioned table. Hence, while you can specify the table space where the index object containing all the indexes for a data partition go, it is not possible to specify a table space for each individual index. This is also why maintenance operations like REORG cannot be performed on an individual local index. This differs from global indexes where each global index has its own object, allowing greater flexibility in table space placement and support for REORG for an individual global index.

Q: I observed more lock conflicts (slightly worse concurrency) using local indexes compared to global indexes. Why?
A: With global index scans, the index look-up determines whether a reference to a partition exists. A partition is locked only if a reference is found. However, with local index scans, a partition needs to be locked first to do the index look-up and find a match. Therefore, some partitions that do not contribute any rows for query processing are still locked when accessed via local index.

Q: Can statistics be collected for an individual partition using RUNSTATS?
A: No, RUNSTATS can only be issued on a table.

Q: How can I tell if an index on a partitioned table is partitioned?
A: Use DESCRIBE INDEXES FOR TABLE command. The possible values for Index Partitioning column are: 'P' for partitioned index, 'N' for nonpartitioned index, and '-' (null) for index on a nonpartitioned table. The table function SYSPROC.ADMIN_GET_INDEX_INFO() also returns this information.

Q: Why is the size of a global index significantly larger than if the same index was local?
A:Local indexes do not store the 2-byte partition identifier in the rowid of each key record. This leads to savings of 25% per rowid for large table spaces that have 6 byte rowids and 33% for regular table spaces that have 4 byte rowid.

Q: What are some common situations that favor local indexes?
A: Local indexes are generally smaller than global indexes. The storage savings typically leads to better performance due to reduced I/O. This factor favors local indexes. Local indexes also play a major role in streamlining the data roll-in and data roll-out operations and make partition-level operations like REORG feasible.

Q: What are some common situations that favor global global indexes?
A: Global indexes allow the flexibility of specifying a table space for each index. In some cases this may be useful where only the table spaces containing certain important indexes need to backed up. Global indexes preserve the order property from index scans, and this gives them an edge over local indexes. In most cases, local indexes do not preserve the order property across partitions and that requires an extra sort if the plan is using local indexes.

For instance, the purchaseOrders table is partitioned by po_date and has a local index on po_customer, the query below will need a sort after the index scan:

SELECT po_id, po_customer FROM purchaseOrders 
    WHERE po_date > '2/15/2010' AND po_date < '4/15/2010'
    ORDER BY po_customer;

However, the following query does not need an extra sort since the query access is limited to a single partition:

SELECT po_id, po_customer FROM purchaseOrders 
    WHERE po_date > '2/15/2010' AND po_date < '2/25/2010'
    ORDER BY po_customer;


Q: How do I know if a data partition or an index partition needs to be reorganized?
A: Run REORGCHK command. The output will show both the data and index statistics for each partition and recommend which partitions should be reorganized.

Q: How to tell the data partition in which a row resides?
A: The DATAPARTITIONNUM function returns the sequence number (SYSDATAPARTITIONS.SEQNO) of the data partition in which the row resides.

Q: How do I calculate the table and index size for a partitioned table (or any table for that matter)?
A: The table functions ADMIN_GET_TAB_INFO_V97 and ADMIN_GET_INDEX_INFO return the table and index size respectively.

Q: Can the local index partition and data partition be in the same table space for each partition?
A: Yes, in fact this is the default case and many customers choose to do so. The local index will be in a different table space from the associated data partition when

  • The partition level INDEX IN clause is specified in the CREATE TABLE statement, or
  • The INDEX IN clause is specified in the ALTER TABLE ADD PARTITION statement for the new partition, or
  • The data partition being attached has indexes in a separate table space from the data.

Conclusion

Local index support in DB2 9.7 significantly improves common data warehousing processes such as data roll-in and data roll-out. By leveraging local indexes, roll-in not only makes the newly attached data visible more quickly, but consumes virtually no active log space. Support for local indexes is a big step towards achieving better partition independence with table partitioning. Partition independence improves availability by allowing access to rest of the table while a subset of its partitions are undergoing maintenance operations like REORG. In typical warehouse environments, recent data changes frequently while historical data is more or less static and this makes partition level REORG invaluable for reorganizing only the active data. Online Detach allows long running reporting queries to run concurrently with the roll-out operation and improves availability of the warehouse.


Resources

Learn

Get products and technologies

Discuss

About the authors

Author Photo: Naresh Chainani

Naresh Chainani is a software developer and has worked in DB2 for Linux, UNIX, and, Windows development in IBM Beaverton for the last nine years. His expertise spans several areas of DB2 with a particular emphasis on table partitioning and decimal floating-point. Recently Naresh led the development of several enhancements to the table partitioning feature in DB2. Prior to that, he led the development of the IEEE-compliant DECFLOAT data type. He frequently interacts with customers either to present table partitioning or provide consulting assistance related to table partitioning, as well as to seek valuable stakeholder feedback

Liping Zhang photo

Liping Zhang is a Senior Software Engineer based at the IBM Beaverton Lab. He has been working on the table partitioning in DB2 for Linux, UNIX, and Windows since version 9.1 and has led the feature design since version 9.7. He has also been working with customers to help them understand and adopt the table partitioning feature. Liping has authored IDUG presentations and frequently contributed to DB2 papers, articles and books on table partitioning.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=496530
ArticleTitle=Unleash the power of table partitioning in your DB2 warehouse
publish-date=06172010
author1-email=naresh@us.ibm.com
author1-email-cc=
author2-email=liping@us.ibm.com
author2-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers