Introducing DB2 9, Part 2: Table partitioning in DB2 9

Improve performance and achieve greater scalability and easier management by using table partitioning, a feature of DB2® 9 (formerly codenamed "DB2 Viper." As tables grow in size, it may be easier to manage your data in chunks or by limited ranges. This article takes a quick look at the basics of table partitions and gives you insight into the benefits of this capability. For more information on the new features of DB2 9, read other articles in the series. [13 Dec 2010: Updated Listing 2 and Listing 4 code examples. --Ed.]

Rav Ahuja, DB2 Program Manager, IBM

Rav Ahuja is a worldwide DB2 program manager based at the IBM Toronto Lab. He has been working with DB2 for Linux, UNIX, and Windows since version 1 and has held various roles in DB2 development, technical support, marketing, and product strategy. He works with customers and partners around the globe helping them build and benefit from DB2 and services-based solutions. Rav is a frequent contributor to DB2 papers, articles and books. He holds a Computer Engineering degree from McGill University and MBA from University of Western Ontario.



13 December 2010 (First published 24 May 2006)

Also available in Russian

Introduction

IBM DB2 e-kit for Database Professionals

Learn how easy it is to get trained and certified for DB2 for Linux, UNIX, and Windows with the IBM DB2 e-kit for Database Professionals. Register now, and expand your skills portfolio, or extend your DBMS vendor support to include DB2.

Management of large databases and tables has its own challenges. As tables grow in size, it sometimes becomes easier to manage their data in chunks or by limited ranges, rather than operate on all of the data in those large tables in their entirety. This methodology of managing data is also especially useful when chunks or ranges of new data are added or old data is removed from tables at frequent intervals (for example, roll-in and roll-out operations typical in data warehousing environments).

DB2 9 includes table (range) partitioning capabilities for easier management, improved performance, and greater scalability of large databases.

What is table partitioning?

Table partitioning (sometimes referred to as range partitioning) is a data organization scheme in which table data is divided across multiple storage objects, called data partitions (not to be confused with database partitions or DPF), according to values in one or more table columns. These storage objects can be in different table spaces, in the same table space, or a combination of both.

DB2 9 supports data partitions or data ranges based on a variety of attributes. A commonly used partitioning scheme is the date, where you may decide to clump together data in data partitions such as by year or month. You could also have numeric attributes for partitioning, for instance, records with IDs from one up to 1 million are be stored in one data partition, IDs from 1 million to 2 million in another data partition, and so on. Or for example, you could have records for customers with names starting with A-C in one data partition, D-M in the second data partition, N-Q in a third data partition, and R-Z in the last data partition.

Although you have the option of referring to data partitions by names or numbers (useful for data partition operations), they can be completely transparent to applications. That is, applications can continue to access data by specifying column and table names, and do not need to worry about which data partition(s) the data resides in.

The table partitioning functionality in DB2 for Linux®, UNIX®, and Windows® is similar to equivalent capabilities in DB2 for z/OS®, Informix® Dynamic Server, and Informix Extended Parallel Server. The DB2 for z/OS, DB2 for iSeries™, and DB2 for Linux, UNIX, and Windows use a common base syntax, but implement a different subset. DB2 for Linux, UNIX, and Windows uses an abbreviated syntax that is actually much more concise than the syntax of either of the other two products.

Table partitioning benefits

The benefits of table partitioning in DB2 9 include:

Improved manageability: DB2 9 allows the various data partitions to be administered independently. For example, you can choose to back up and restore individual data partitions instead of entire tables. This lets you break down time-consuming maintenance operations into a series of smaller operations.

Increased query performance: The DB2 optimizer is data partition aware. Therefore, during query execution, only the relevant data partitions are scanned. Eliminating the need to scan data partitions that are not impacted by the query can result in improved performance.

Figure 1. Scanning only relevant partitions
Scanning only relevant partitions

Fast roll-in / roll-out: DB2 9 allows data partitions to be easily added or removed from the table without having to take the database offline. This ability can be particularly useful in a data warehouse environment where you often need to load or delete data to run decision-support queries. For example, a typical insurance data warehouse may have three years of claims history. As each month is loaded and rolled-in into the warehouse, the oldest month can be archived and removed (rolled-out) from the active table. This method of rolling out data partitions is also more efficient as it does not need to log delete operations, which would be the case when deleting specific data ranges.

Better optimization of storage costs: Table partitioning in DB2 9 lets you integrate better with hierarchical storage models. By only using your fastest and most expensive storage hardware for only the most active data partitions, DB2 9 allows you to optimize your overall storage costs and improve performance. If most of your queries only run against the last three months of data, you have to option to assign slower and less expensive storage hardware to older data.

Larger table capacity: Without partitioning, there are limits on the maximum amount of data a storage object, and hence a table, can hold. However, by dividing the contents of the table into multiple storage objects or data partitions, each capable of supporting as much data as in a non-partitioned table, you can effectively create databases that are virtually unlimited in size.

Greater index placement flexibility: DB2 9 allows indexes for partitioned tables to be stored in their own storage objects (table spaces), as opposed to being in the same storage object as the non-partitioned table. This index placement flexibility is particularly useful for performing faster index operations (such as drop index, online index create, and index reorganization), managing table growth, and reduced I/O contention, providing more efficient concurrent access to the index data for the table.

"We are very pleased with some of the new features we’ve tested in DB2 9 …The enhancements to ALTER TABLE work well, improve productivity, and save time. The ability to place indexes anywhere for partitioned tables is also particularly useful when we run out of space in the designated table space."
- Ellen Reys-Klebaner, Chief Database Architect, Visa

Create and use table partitions

DB2 provides a lot of flexibility for creating partitioned tables. Say you have one year’s worth of data and you want to partition it by date, so that each quarter resides in a separate data partition. The following create table syntax illustrates how to do this easily. You could also use the graphical DB2 Control Center for creating and managing data partitions.

Listing 1. Create table code to partition by date
CREATE TABLE orders(id INT, shipdate DATE, …)
  PARTITION BY RANGE(shipdate)
    (
    STARTING '1/1/2006' ENDING '12/31/2006' 
      EVERY 3 MONTHS
    )

This results in a table being created with four data partitions, each with three months of data.

Figure 2. Table with 4 data partitions
Table with 4 data partitions

As the following example illustrates, it is also possible to specify the data partition ranges explicitly, have open ended ranges, and give names to each data partition. Naming data partitions is useful for performing data partition operations such as DETACH (more on this to follow).

Listing 2. Create table code to specify data partition ranges explicitly
CREATE TABLE orders(id INT, shipdate DATE, …)
  PARTITION BY RANGE(shipdate)
    (
  PARTITION q4_05 STARTING MINVALUE,
    PARTITION q1_06 STARTING '1/1/2006',
    PARTITION q2_06 STARTING '4/1/2006',
    PARTITION q3_06 STARTING '7/1/2006',
    PARTITION q4_06 STARTING '10/1/2006' 
                  ENDING ‘12/31/2006'
    )

With table partitioning, you can easily roll-in and roll-out data instantaneously. The ATTACH and DETACH options in the ALTER TABLE command facilitate these operations. Using DETACH, an existing data partition or range of values is split off into a standalone table. The rolled-out (detached) table can be dropped, archived, moved to slower storage, and so on.

Now say you want to keep only one year’s (2006) data in the orders table created in the last example. You can DETACH the partition containing the old (pre-2006) data.

Listing 3. Coding detach in the partition
ALTER TABLE orders 
	DETACH PARTITION qold INTO oldorders

Similarly, attaching (rolling-in) data as a new data partition in a table is just as easy. The data that is to be rolled-in is first loaded into a separate (staging) table, transformed or cleansed if needed, and then attached to an existing table as shown in the following example.

Listing 4. Attaching data as a new data partition
// CREATE TABLE neworders
// load / insert desired data into neworders
// transform or cleanse new data if neeeded
  ALTER TABLE orders
  	ATTACH PARTITION q1_07
  	STARTING '01/01/2007'
  	ENDING   '03/31/2007'
  	FROM TABLE neworders 
// COMMIT 
// SET INTEGRITY …
// COMMIT

Before the rolled-in data is visible to applications, the SET INTEGRITY statement needs issued (to validate the new data and perform maintenance for global indexes) and the work commited.

Figure 3 shows the rolled-in and rolled-out data partitions in the orders table used in the previous examples.

Figure 3. Rolled-in and rolled-out data partitions
Rolled-in and rolled-out data partitions

Table partitioning and other organization schemes

Table partitioning in DB2 9 can be used in isolation or in combination with other data organization schemes. Each clause of the CREATE TABLE statement includes an algorithm to indicate how the data should be organized. The following three clauses demonstrate the levels of data organization that can be used together in any combination:

  • DISTRIBUTE BY -- Spreads data evenly across database partitions. Use this clause to enable intraquery parallelism and distribute the load across each database partition. This concept is known as database partitioning and is enabled using the Database Partitioning Feature (DPF) in DB2.
  • PARTITION BY -- Groups rows with similar values of a single dimension in the same data partition. This concept is known as table partitioning.
  • ORGANIZE BY -- Groups rows with similar values on multiple dimensions in the same table extent. This concept is known as multidimensional clustering (MDC).

This syntax allows for consistency between the clauses as well as allowing for future algorithms of data organization. Combining the DISTRIBUTE BY and PARTITION BY clauses of the CREATE TABLE statement allows data to be spread across database partitions spanning multiple table spaces.

DB2 9 is the first data server to support all three methods of grouping data at the same time. This is a major innovation in improving data management and information availability.

Figure 4 illustrates all three DB2 data organization schemes being used in conjunction with each other:

Figure 4. Three DB2 data organization schemes
Three DB2 data organization schemes

Conclusion

Table partitioning in DB2 9 provides powerful capabilities for managing large volumes easily and quickly. You are encouraged to download DB2 9 and try out this feature for yourself to see how you can benefit from this and other enhanced features in this new version of DB2.

Resources

Learn

Get products and technologies

  • Download DB2 9 and try it out today.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.
  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=124039
ArticleTitle=Introducing DB2 9, Part 2: Table partitioning in DB2 9
publish-date=12132010