Data organization and partitioning
Several technologies in DB2 allow you to break up your data into smaller "chunks" for greater parallelism for queries, allow for partition elimination in queries, and in general help improve performance. One of these technologies is the Table (or Range) Partitioning feature, this feature allows you to take a single table and divide it into partitions which may be spread across multiple table spaces.
This partitioning capability has a number of benefits, including:
- Simplified syntax for creating table or range partitions.
- Large tables can be spread across several table spaces. Note that all of the table spaces specified must have the same page size, extent size, storage mechanism (DMS or SMS), and type (REGULAR or LARGE), and all of the table spaces must be in the same database partition group.
- Simplified process for rolling in and rolling out of table data by seamlessly attaching and detaching table partitions.
- Improved query performance by automatically eliminating data partitions based on predicates in the query.
Below is the simplified syntax for creating a partitioned table to store 24 month's worth of data in 24 partitions spread across four table spaces.
Listing 5. Syntax for creating a partitioned table
CREATE TABLE fact (txn_id char(7), purchase_date date, ...) IN tbsp1, tbsp2, tbsp3, tbsp4 PARTITION BY RANGE (purchase_date) ( STARTING FROM ('2005-01-01') ENDING ('2006-12-31') EVERY 1 MONTH )
The table creation syntax has a short and a long form. Before diving into the syntax, let's first talk about the partitioning column. Table partitioning allows for the specification of ranges of data, where each range goes into a separate partition. The partitioning column, or columns, defines the ranges. A partitioning column can be any of the DB2 base data types except for LOBs and LONG VARCHAR columns. You can also specify multiple columns (watch for an example later), and you can specify generated columns to simulate partitioning on a function. Here is the short form.
Listing 6. Short form for table partitioning syntax
CREATE TABLE t1(c1 INT) IN tbsp1, tbsp2, tbsp3 PARTITION BY RANGE(c1) (STARTING FROM (0) ENDING (80) EVERY (10))
Here is the same result using the long form.
Listing 7. Long form for table partitioning syntax
CREATE TABLE t1(c1 INT) PARTITION BY RANGE(c1) (STARTING FROM (0) ENDING (10) IN tbsp1, ENDING (20) IN tbsp2, ENDING (30) IN tbsp3, ENDING (40) IN tbsp1, ENDING (50) IN tbsp2, ENDING (60) IN tbsp3, ENDING (70) IN tbsp1, ENDING (80) IN tbsp2)
Another benefit is that when you detach a partition you get a separate table that contains the contents of that partition. You can detach a partition from a table and then do something with that newly detached partition, which is now a physical table. For example, you could archive off that table, move it to tertiary storage, copy it to another location, or whatever you want to do. DB2 10.1 will asynchronously clean up any index keys on that partitioned table without impacting running applications.
In fact, detaching a table partition from a partitioned table is invoked using
ALTER TABLE..DETACH PARTITION..INTO
clause and consists of two phases:
- The partition is logically detached from the partitioned table.
- Asynchronously the logically detached partitioned is converted into a stand-alone table.
This implies that detached data becomes invisible instantly and no actual data
movement occurs during the detach process. A typical example of
detaching a table partition:
part_table DETACH PARTITION p0 INTO TABLE p0_tab;.
In the above statement, the partition p0 is detached from the part_table table, and asynchronously a new physical table p0_tab is created containing the content of the partition p0.
Note that the following conditions must be met before you can perform a
- The table to be detached from (source table) must exist and be a partitioned table.
- The data partition to be detached must exist in the source table.
- The source table must have more than one data partition. A partitioned table
must have at least one data partition. Only visible and attached data
partitions pertain in this context. An attached data partition is a data
partition that is attached but not yet validated by the
- The name of the table to be created by the
DETACH PARTITIONoperation (target table) must not exist.
DETACH PARTITIONis not allowed on a table that is the parent of an enforced referential integrity (RI) relationship. So you must change the RI constraints into informational by setting them to
NOT ENFORCEDbefore you can detach a partition. Then you can enforce the constraint again after the detaching process ends.
- If there are any detached dependent tables (tables need to be incrementally
maintained with respect to the detached data partition), the
SET INTEGRITYstatement is required to be run on these tables to incrementally maintain the tables. With DB2 9.7 Fix Pack 1 or later, after the
SET INTEGRITYstatement is run on all detached dependent tables, the asynchronous partition detach task makes the data partition into a stand-alone target table. Until the asynchronous partition detach operation completes, the target table is unavailable.
Similar to adding a new partition, you simply create a table with the same definition as the partitioned table, load it with data, and attach that partition to the main partitioned table, as follows:
Listing 8. Attach partition to main partitioned table
ALTER TABLE FACT_TABLE ATTACH PARTITION STARTING '06-01-2006' ENDING '06-30-2006' FROM TABLE FACT_NEW_MONTH
You need to use
SET INTEGRITY to validate
data and maintain indices after the
operation. Starting with DB2 10.1, if data integrity can be checked before
attachment, newly attached data can be made available sooner with
IMMEDIATE UNCHECKED. Otherwise, if
INTEGRITY IMMEDIATE CHECKED is used,
commit is required before data is visible.
There are two types of indexing on partitioned tables: partitioned and non-partitioned indices. In general, indices on partitioned tables behave the same as those on non-partitioned tables, but they follow a different storage model:
- While all indices of a non-partitioned table reside in one shared index object within the same table space, non-partitioned indices of a partitioned table are created each in its own index object in a single table space, even if the table partition spans several table spaces.
- A partitioned index uses an index organization scheme in which index data is
divided across multiple index partitions, according to the partitioning
scheme of the table. Each index partition refers only to table rows in the
corresponding data partition. All index partitions for a specific data
partition reside in the same index object. By default, index partitions are
placed in the same tablespace as the data partitions that they reference.
To override this default behavior, you must use the
INDEX INclause for each data partition you define by using the
CREATE TABLEstatement. In other words, if you plan to use partitioned indices for a partitioned table, you must anticipate where you want the index partitions to be stored when you create the table. If you try to use
INDEX INwhen creating a partitioned index, you receive an error message.
The figure below illustrates the placement of indices on non-partitioned table
table1. This is done using the
clause of the
CREATE TABLE statement.
Listing 9. Use INDEX IN clause of CREATE TABLE statement
CREATE TABLE table1 (a int, b int) IN tbs1 INDEX IN tbs2 CREATE INDEX i1 ON table1 (a) CREATE INDEX i2 ON table1 (b)
Figure 5. Indices on a non-partitioned table
The figure below illustrates the placement of partitioned and non-partitioned tables on a partitioned table table1 whose definition is as follows:
create table table1 (a int, b int) partition by range (a) (starting from (1)ending at (100), partition by range (a) (starting from (101)ending at (200), partition by range (a) (starting from (201)ending at (300) index in tbs6)
Figure 6. Indices on a partitioned table
We have the following indices defined:
- Non-partitioned index i1 resides in table space tbs4:
CREATE INDEX i1 ON table1 (b) NOT PARTITIONED IN tbs4.
- Non-partitioned index i2 resides in table space tbs5:
CREATE INDEX i2 ON table1 (a,b) NOT PARTITIONED IN tbs5.
- Partitioned index i3 resides in several table spaces as follows:
CREATE INDEX i2 ON table1 (a) PARTITIONED. Indices on partitions 1 and 2 reside in the same table spaces as the data partitions they refer to following the default behavior. Data partition 3 overrides the default behavior using the
INDEX INclause to specify tbs6 as the tablespace for its partitioned indices.
Note that if the table is partitioned and neither PARTITIONED nor NOT PARTITIONED is specified, the index is created as partitioned.
Note that the
IN table space name clause in
CREATE INDEX statement can be
specified only for a non-partitioned index on a partitioned table. The
specification of a table space specifically for the index overrides a
specification made using the
INDEX IN clause when the table was created.
Range clustered tables (RCTs) are tables where data are clustered according to a specific key value. Every key value is associated with its corresponding row location. Hence, this approach provides exceptionally fast access to specific table rows. Key values must have the following characteristics:
- Not null
- An integer (SMALLINT, INTEGER, or BIGINT)
- Monotonically increasing
- Within a predetermined set of ranges based on each column in the key
Note that you cannot create a regular index on the same key values used to define the range-clustered table.
DB2 pre-allocates the disk space required for the RCT at creation time. This is
done by calculating the number of distinct key values and multiplying it by the
table row size. This mandates that the space required for holding the entire
table should be available at creation time. Additionally, you cannot issue an
ALTER TABLE statement to alter the physical characteristics of an RCT after its
RCTs are created using the
ORGANIZE BY KEY
SEQUENCE clause of the
CREATE TABLE statement as follows:
Listing 10. Creating RCTs
create table table1 (id int not null, emp_id int not null, emp_name varchar(40) ) organize by key sequence (id starting from 100 ending 1000, emp_id ending 100) disallow overflow
This example creates an RCT with the following key characteristics:
- ID values range from 100 to 1,000.
- emp_id values range from 1 to 100 (if the
STARTING FROMclause is not stated, the default starting value is 1).
DISALLOW OVERFLOWclause indicates that rows with key values outside the specified ranges will not be accepted.
A benefit of overflow-disallowed RCT tables is that table reorganization operations are not required since such tables are always clustered.
In cases you need to allow rows with key values outside the specified range to be
stored in the table, you should use the
OVERFLOWS clause while defining
the table. In this case, rows with overflow key values are placed in an overflow
area, which is dynamically allocated. As more records are added to this overflow
area, operations against the table that involve the overflow area require more
processing time. The order of rows in the overflow area is not guaranteed. The
larger the overflow area, the more time is required to access it. If this
becomes a problem, consider reducing the size of the overflow area by exporting
the data to a new RCT with wider ranges.
Listing 11. Example of an RCT allowing overflows
create table table1 (id int not null, emp_id int not null, emp_name varchar(40) ) organize by key sequence (id starting from 100 ending 1000, emp_id ending 100) allow overflow
Multi-dimensional clustering (MDC) enables a table to be physically clustered on more than one key, or dimension, simultaneously. Prior to Version 8, DB2 supported only single-dimensional clustering of data using clustering indices. When a clustering index is defined on a table, DB2 attempts to maintain the physical order of the data on pages, based on the key order of the clustering index, as records are inserted into and updated in the table. This can significantly improve the performance of queries that have predicates containing the keys of the clustering index because, with good clustering, only a portion of the physical table needs to be accessed. When the pages are stored sequentially on disk, more efficient prefetching can be performed.
With MDC, these same benefits are extended to more than one dimension, or clustering key. In the case of query performance, range queries involving any one or combination of the specified dimensions of the table will benefit from the underlying clustering. These queries will need to access only those pages having records with the specified dimension values, and the qualifying pages will be grouped together in extents.
A table with a clustering index can become unclustered over time, as available space is filled in the table. However, an MDC table is able to maintain its clustering over the specified dimensions automatically and continuously, eliminating the need to reorganize the table to restore the physical order of the data.
When an MDC table is created, the dimensional keys along which to cluster the table's data are specified. Each specified dimension can be defined with one or more columns, the same as an index key. A dimension block index will be automatically created for each of the dimensions specified and will be used to access data quickly and efficiently along each of the specified dimensions. A block index will also be automatically created, containing all dimension key columns. The block index will be used to maintain the clustering of the data during insert and update activity, as well as for quick and efficient access to the data.
Every unique combination of the table's dimension values forms a logical cell, which is physically composed of blocks of pages, where a block is a set of consecutive pages on disk. The set of blocks that contain pages with data having the same key value of one of the dimension block indices is called a slice. Every page of the table will be stored in only one block, and all blocks of the table will consist of the same number of pages, known as the blocking factor. The blocking factor is equal to the table space's extent size, so the block boundaries line up with extent boundaries.
To create an MDC table, you need to specify the dimensions of the table using the
organize by parameter, as follows:
Listing 12. Syntax for creating a MDC table
CREATE TABLE MDCTABLE( Year INT, Nation CHAR(25), Colour VARCHAR(10), ... ) ORGANIZE BY(Year, Nation, Color)
In this example, the table will be organized on the year, nation, and color dimensions, and will logically look like the figure below.
Figure 7. Data organization within an MDC
You cannot alter a table and make it into an MDC table, so use the design adviser, if possible, before you create the database to see if your tables should be MDC tables or normal tables.
The following list summarizes the design considerations for MDC tables:
- When identifying candidate dimensions, search for attributes that are not too granular, thereby enabling more rows to be stored in each cell. This approach will make better use of block-level indices.
- Higher data volumes may improve population density.
- It might be useful to load the data first as non-MDC tables for analysis only.
- The table space extent size is a critical parameter for efficient space usage.
- Although an MDC table may require a greater initial understanding of the data, the payback is that query times will likely improve.
- Some data may be unsuitable for MDC tables and would be better implemented using a standard clustering index.
- Although a smaller extent size will provide the most efficient use of disk space, the I/O for the queries should also be considered.
- A larger extent will normally reduce I/O cost, because more data will be read at a time. This, in turn, makes for smaller dimension block indices because each dimension value will need fewer blocks. And, inserts will be quicker because new blocks will be needed less often.
Insert time clustering (ITC) is a new partitioning feature coming with DB2 10.1. ITC tables cluster data based on their insertion time so, data inserted within the same time interval are physically grouped together. ITC tables eases the management of space utilization in cases when your system runs batch data deletion and you want to reclaim the trapped free space after the batch deletion is done.
ITC tables have similar characteristics to MDC tables. For example, both table types use block based allocation and block indices. ITC and MDC tables differ in how data is clustered. ITC tables cluster data by using a implicitly created virtual column to cluster rows inserted at a similar time together. ITC virtual dimension cannot be manipulated while clustering dimensions on MDC tables are specified by the creator.
ITC tables are created with the
CREATE TABLE command by specifying the
BY INSERT TIME clause:
CREATE TABLE ...
ORGANIZE BY INSERT TIME. You can convert existing tables to ITC tables using either of two ways:
- Using the
ADMIN_MOVE_TABLEprocedure to convert the table online.
- Using the export/import or a load from table utilities.
Note that existing tables cannot be altered to become ITC tables.
A materialized query table (MQT) is a table whose definition is based upon the result of a query. The data contained in an MQT is derived from one or more tables accessed by the query on which the MQT definition is based. You can think of an MQT as a kind of materialized view. Views and MQTs are defined on the basis of a query. However, an MQT actually stores the query results as data, and you can work with the data that is in the MQT instead of the data in the underlying tables. Thus, MQTs can significantly improve the performance of queries, especially complex queries.
An MQT can be defined at table creation time as maintained by the system or maintained by the user. When you create this type of MQT, you can specify whether the table data is a REFRESH IMMEDIATE or REFRESH DEFERRED. The REFRESH keyword lets you specify how the data is to be maintained. DEFERRED means that the data in the table can be refreshed at any time using the REFRESH TABLE statement. Neither REFRESH DEFERRED nor REFRESH IMMEDIATE system-maintained MQTs allow insert, update, or delete operations to be executed against them. However, REFRESH IMMEDIATE system-maintained MQTs are updated with changes made to the underlying tables as a result of insert, update, or delete operations.
The following example creates a system maintained MQT with the
REFRESH IMMEDIATE clause. The
DEFERRED clause means that data is not inserted into
the table as part of the
CREATE TABLE statement.
Listing 13. Syntax for creating a system maintained MQT
CREATE TABLE EMP_DEP AS (SELECT E.EMPNO, E.FIRSTNME, E.LASTNAME, E.PHONENO, D.DEPTNO, SUBSTR(D.DEPTNAME, 1, 12) AS DEPARTMENT, D.MGRNO FROM EMPLOYEE E, DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO) DATA INITIALLY DEFERRED REFRESH IMMEDIATE
In the above example:
- An MQT called EMP_DEP is created joining the data from the EMPLOYEE and DEPARTMENT table.
DATA INITIALLY DEFERREDclause means that EMP_DEP MQT is not populated with data as part of the
REFRESH IMMEDIATEclause means that refreshing the data in the MQT is maintained by the system once an update is made to the underlying EMPLOYEE or DEPARTMENT tables.
After being created, the MQT is in check-pending state and cannot be queried
until the following
SET INTEGRITY statement
SET INTEGRITY FOR EMP_DEP IMMEDIATE
CHECKED NOT INCREMENTAL.
IMMEDIATE CHECKED clause in the above statement specifies that the data is to
be checked against the EMP_DEP MQT's defining query and refreshed. The
INCREMENTAL clause specifies that integrity checking is to be done on the whole
MQT and, hence, the MQT definition is recomputed.