Management of large databases and tables has its own challenges. As tables grow in size, it sometimes becomes easier to manage their data in chunks or by limited ranges, rather than operate on all of the data in those large tables in their entirety. This methodology of managing data is also especially useful when chunks or ranges of new data are added or old data is removed from tables at frequent intervals (for example, roll-in and roll-out operations typical in data warehousing environments).
DB2 9 includes table (range) partitioning capabilities for easier management, improved performance, and greater scalability of large databases.
Table partitioning (sometimes referred to as range partitioning) is a data organization scheme in which table data is divided across multiple storage objects, called data partitions (not to be confused with database partitions or DPF), according to values in one or more table columns. These storage objects can be in different table spaces, in the same table space, or a combination of both.
DB2 9 supports data partitions or data ranges based on a variety of attributes. A commonly used partitioning scheme is the date, where you may decide to clump together data in data partitions such as by year or month. You could also have numeric attributes for partitioning, for instance, records with IDs from one up to 1 million are be stored in one data partition, IDs from 1 million to 2 million in another data partition, and so on. Or for example, you could have records for customers with names starting with A-C in one data partition, D-M in the second data partition, N-Q in a third data partition, and R-Z in the last data partition.
Although you have the option of referring to data partitions by names or numbers (useful for data partition operations), they can be completely transparent to applications. That is, applications can continue to access data by specifying column and table names, and do not need to worry about which data partition(s) the data resides in.
The table partitioning functionality in DB2 for Linux®, UNIX®, and Windows® is similar to equivalent capabilities in DB2 for z/OS®, Informix® Dynamic Server, and Informix Extended Parallel Server. The DB2 for z/OS, DB2 for iSeries™, and DB2 for Linux, UNIX, and Windows use a common base syntax, but implement a different subset. DB2 for Linux, UNIX, and Windows uses an abbreviated syntax that is actually much more concise than the syntax of either of the other two products.
The benefits of table partitioning in DB2 9 include:
- Improved manageability for large tables
- Increased query performance through data partition elimination
- Fast online data roll-in / roll-out
- Better optimization of storage costs
- Larger table capacity
- Greater index placement flexibility
Improved manageability: DB2 9 allows the various data partitions to be administered independently. For example, you can choose to back up and restore individual data partitions instead of entire tables. This lets you break down time-consuming maintenance operations into a series of smaller operations.
Increased query performance: The DB2 optimizer is data partition aware. Therefore, during query execution, only the relevant data partitions are scanned. Eliminating the need to scan data partitions that are not impacted by the query can result in improved performance.
Figure 1. Scanning only relevant partitions

Fast roll-in / roll-out: DB2 9 allows data partitions to be easily added or removed from the table without having to take the database offline. This ability can be particularly useful in a data warehouse environment where you often need to load or delete data to run decision-support queries. For example, a typical insurance data warehouse may have three years of claims history. As each month is loaded and rolled-in into the warehouse, the oldest month can be archived and removed (rolled-out) from the active table. This method of rolling out data partitions is also more efficient as it does not need to log delete operations, which would be the case when deleting specific data ranges.
Better optimization of storage costs: Table partitioning in DB2 9 lets you integrate better with hierarchical storage models. By only using your fastest and most expensive storage hardware for only the most active data partitions, DB2 9 allows you to optimize your overall storage costs and improve performance. If most of your queries only run against the last three months of data, you have to option to assign slower and less expensive storage hardware to older data.
Larger table capacity: Without partitioning, there are limits on the maximum amount of data a storage object, and hence a table, can hold. However, by dividing the contents of the table into multiple storage objects or data partitions, each capable of supporting as much data as in a non-partitioned table, you can effectively create databases that are virtually unlimited in size.
Greater index placement flexibility: DB2 9 allows indexes for partitioned tables to be stored in their own storage objects (table spaces), as opposed to being in the same storage object as the non-partitioned table. This index placement flexibility is particularly useful for performing faster index operations (such as drop index, online index create, and index reorganization), managing table growth, and reduced I/O contention, providing more efficient concurrent access to the index data for the table.
Create and use table partitions
DB2 provides a lot of flexibility for creating partitioned tables. Say you have one year’s worth of data and you want to partition it by date, so that each quarter resides in a separate data partition. The following create table syntax illustrates how to do this easily. You could also use the graphical DB2 Control Center for creating and managing data partitions.
CREATE TABLE orders(id INT, shipdate DATE, …)
PARTITION BY RANGE(shipdate)
(
STARTING '1/1/2006' ENDING '12/31/2006'
EVERY 3 MONTHS
)
|
This results in a table being created with four data partitions, each with three months of data.
Figure 2. Table with 4 data partitions

As the following example illustrates, it is also possible to specify the data partition ranges explicitly, have open ended ranges, and give names to each data partition. Naming data partitions is useful for performing data partition operations such as DETACH (more on this to follow).
CREATE TABLE orders(id INT, shipdate DATE, …)
PARTITION BY RANGE(shipdate)
(
PARTITION 4q05 STARTING MINVALUE,
PARTITION 1q06 STARTING '1/1/2006',
PARTITION 2q06 STARTING '4/1/2006',
PARTITION 3q06 STARTING '7/1/2006',
PARTITION 4q06 STARTING '10/1/2006'
ENDING ‘12/31/2006'
)
|
With table partitioning, you can easily roll-in and roll-out data instantaneously. The ATTACH and DETACH options in the ALTER TABLE command facilitate these operations. Using DETACH, an existing data partition or range of values is split off into a standalone table. The rolled-out (detached) table can be dropped, archived, moved to slower storage, and so on.
Now say you want to keep only one year’s (2006) data in the orders table created in the last example. You can DETACH the partition containing the old (pre-2006) data.
ALTER TABLE orders DETACH PARTITION qold INTO oldorders |
Similarly, attaching (rolling-in) data as a new data partition in a table is just as easy. The data that is to be rolled-in is first loaded into a separate (staging) table, transformed or cleansed if needed, and then attached to an existing table as shown in the following example.
// CREATE TABLE neworders // load / insert desired data into neworders // transform or cleanse new data if neeeded ALTER TABLE orders ATTACH PARTITION 1q07 STARTING '01/01/2007' ENDING '03/31/2007' FROM TABLE neworders // COMMIT // SET INTEGRITY … // COMMIT |
Before the rolled-in data is visible to applications, the SET INTEGRITY statement needs issued (to validate the new data and perform maintenance for global indexes) and the work commited.
Figure 3 shows the rolled-in and rolled-out data partitions in the orders table used in the previous examples.
Figure 3. Rolled-in and rolled-out data partitions

Table partitioning and other organization schemes
Table partitioning in DB2 9 can be used in isolation or in combination with other data organization schemes. Each clause of the CREATE TABLE statement includes an algorithm to indicate how the data should be organized. The following three clauses demonstrate the levels of data organization that can be used together in any combination:
DISTRIBUTE BY-- Spreads data evenly across database partitions. Use this clause to enable intraquery parallelism and distribute the load across each database partition. This concept is known as database partitioning and is enabled using the Database Partitioning Feature (DPF) in DB2.PARTITION BY-- Groups rows with similar values of a single dimension in the same data partition. This concept is known as table partitioning.ORGANIZE BY-- Groups rows with similar values on multiple dimensions in the same table extent. This concept is known as multidimensional clustering (MDC).
This syntax allows for consistency between the clauses as well as allowing for future algorithms of data organization. Combining the DISTRIBUTE BY and PARTITION BY clauses of the CREATE TABLE statement allows data to be spread across database partitions spanning multiple table spaces.
DB2 9 is the first data server to support all three methods of grouping data at the same time. This is a major innovation in improving data management and information availability.
Figure 4 illustrates all three DB2 data organization schemes being used in conjunction with each other:
Figure 4. Three DB2 data organization schemes

Table partitioning in DB2 9 provides powerful capabilities for managing large volumes easily and quickly. You are encouraged to download DB2 9 and try out this feature for yourself to see how you can benefit from this and other enhanced features in this new version of DB2.
Learn
-
Learn about DB2 Express-C, the no-charge version of DB2 Express Edition for the community.
- DB2 9: Range Partitioning and Large RID support: Listen to the replay of the DB2 Chat with the Lab for further technical details on table partitioning in DB2 9.
-
Read more articles about
DB2 Viper on developerWorks.
- developerWorks Information Management zone: Learn more about DB2. Find technical documentation, how-to articles, education, downloads, product information, and more.
- developerWorks DB2 for Linux, UNIX, and Windows resource page: Read articles and tutorials, and link to documentation, support, learning resources, and more.
-
Stay current with developerWorks
technical events and webcasts.
Get products and technologies
-
Download
DB2 9 and try it out today.
-
Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.
-
Build your next development project with
IBM
trial software, available for download directly from developerWorks.
Discuss
- Participate in the discussion forum.
-
Participate in developerWorks
blogs and get involved in the developerWorks community.
Rav Ahuja is a worldwide DB2 program manager based at the IBM Toronto Lab. He has been working with DB2 for Linux, UNIX, and Windows since version 1 and has held various roles in DB2 development, technical support, marketing, and product strategy. He works with customers and partners around the globe helping them build and benefit from DB2 and services-based solutions. Rav is a frequent contributor to DB2 papers, articles and books. He holds a Computer Engineering degree from McGill University and MBA from University of Western Ontario.
Comments (Undergoing maintenance)





