Data organization schemes in Db2 and Informix databases
Table partitioning is a data organization scheme in which table data is divided across multiple storage objects called data partitions according to values in one or more table columns. Each data partition is stored separately. These storage objects can be in different table spaces, in the same table space, or a combination of both.
Table data is partitioned as specified in the PARTITION BY clause of the CREATE TABLE statement. The columns used in this definition are referred to as the table partitioning key columns. Db2 table partitioning maps to the data fragmentation approach to data organization offered by Informix® Dynamic Server and Informix Extended Parallel Server.
The Informix approach
Informix supports several data organization schemes, which are called fragmentation in the Informix products. One of the more commonly used types of fragmentation is FRAGMENT BY EXPRESSION. This type of fragmentation works much like a CASE statement, where there is an expression associated with each fragment of the table. These expressions are checked in order to determine where to place a row.
An Informix and Db2 database system comparison
Db2 database provides a rich set of complementary features that map directly to the Informix data organization schemes, making it relatively easy for customers to convert from the Informix syntax to the Db2 syntax. The Db2 database manager handles complicated Informix schemes using a combination of generated columns and the PARTITION BY RANGE clause of the CREATE TABLE statement. Table 1 compares data organizations schemes used in Informix and Db2 database products.
Data organization scheme | Informix syntax | Db2 Version 9.1 syntax |
---|---|---|
|
FRAGMENT BY EXPRESSION | PARTITION BY RANGE |
|
FRAGMENT BY ROUND ROBIN | No syntax: Db2 database manager automatically spreads data among containers |
|
FRAGMENT BY RANGE | PARTITION BY RANGE |
|
FRAGMENT BY HASH | DISTRIBUTE BY HASH |
|
FRAGMENT BY HYBRID | DISTRIBUTE BY HASH, PARTITION BY RANGE |
|
n/a | ORGANIZE BY DIMENSIONS |
Examples
The following examples provide details on how to accomplish Db2 database equivalent outcomes for any Informix fragment by expression scheme.
Example 1: The following basic create table statement shows Informix fragmentation and the equivalent table partitioning syntax for a Db2 database system:
CREATE TABLE demo(a INT) FRAGMENT BY EXPRESSION
a = 1 IN db1,
a = 2 IN db2,
a = 3 IN db3;
CREATE TABLE demo(a INT) PARTITION BY RANGE(a)
(STARTING(1) IN db1,
STARTING(2) IN db2,
STARTING(3) ENDING(3) IN db3);
Informix XPS supports a two-level fragmentation scheme known as hybrid where data is spread across co-servers with one expression and within the co-server with a second expression. This allows all co-servers to be active on a query (that is, there is data on all co-servers) as well as allowing the query to take advantage of data partition elimination.
The Db2 database system achieves the equivalent organization scheme to the Informix hybrid using a combination of the DISTRIBUTE BY and PARTITION BY clauses of the CREATE TABLE statement.
Example 2:The following example shows the syntax for the combined clauses:
CREATE TABLE demo(a INT, b INT) FRAGMENT BY HYBRID HASH(a)
EXPRESSION b = 1 IN dbsl1,
b = 2 IN dbsl2;
CREATE TABLE demo(a INT, b INT) IN dbsl1, dbsl2
DISTRIBUTE BY HASH(a),
PARTITION BY RANGE(b) (STARTING 1 ENDING 2 EVERY 1);
CREATE TABLE demo(a INT, b INT, c INT) IN dbsl1, dbsl2
DISTRIBUTE BY HASH(a),
PARTITION BY RANGE(b) (STARTING 1 ENDING 2 EVERY 1)
ORGANIZE BY DIMENSIONS(c);
Thus, all rows with the same value of column a are in the same database partition. All rows with the same value of column b are in the same table space. For a given value of a and b, all rows with the same value c are clustered together on disk. This approach is ideal for OLAP-type drill-down operations, because only one or several extents (blocks)in a single table space on a single database partition must be scanned to satisfy this type of query.
Table partitioning applied to common application problems
The following sections discuss how to apply the various features of Db2 table partitioning to common application problems. In each section, particular attention is given to best practices for mapping various Informix fragmentation schemes into equivalent Db2 table partitioning schemes.
Considerations for creating simple data partition ranges
One of the most common applications of table partitioning is to partition a large fact table based on a date key. If you need to create uniformly sized ranges of dates, consider using the automatically generated form of the CREATE TABLE syntax.
Examples
CREATE TABLE orders
(
l_orderkey DECIMAL(10,0) NOT NULL,
l_partkey INTEGER,
l_suppkey INTEGER,
l_linenumber INTEGER,
l_quantity DECIMAL(12,2),
l_extendedprice DECIMAL(12,2),
l_discount DECIMAL(12,2),
l_tax DECIMAL(12,2),
l_returnflag CHAR(1),
l_linestatus CHAR(1),
l_shipdate DATE,
l_commitdate DATE,
l_receiptdate DATE,
l_shipinstruct CHAR(25),
l_shipmode CHAR(10),
l_comment VARCHAR(44))
PARTITION BY RANGE(l_shipdate)
(STARTING '1/1/1992' ENDING '12/31/1993' EVERY 1 MONTH);
This creates 24 ranges, one for each month in 1992-1993. Attempting to insert a row with l_shipdate outside of that range results in an error.
create table orders
(
l_orderkey decimal(10,0) not null,
l_partkey integer,
l_suppkey integer,
l_linenumber integer,
l_quantity decimal(12,2),
l_extendedprice decimal(12,2),
l_discount decimal(12,2),
l_tax decimal(12,2),
l_returnflag char(1),
l_linestatus char(1),
l_shipdate date,
l_commitdate date,
l_receiptdate date,
l_shipinstruct char(25),
l_shipmode char(10),
l_comment varchar(44)
) fragment by expression
l_shipdate < '1992-02-01' in ldbs1,
l_shipdate >= '1992-02-01' and l_shipdate < '1992-03-01' in ldbs2,
l_shipdate >= '1992-03-01' and l_shipdate < '1992-04-01' in ldbs3,
l_shipdate >= '1992-04-01' and l_shipdate < '1992-05-01' in ldbs4,
l_shipdate >= '1992-05-01' and l_shipdate < '1992-06-01' in ldbs5,
l_shipdate >= '1992-06-01' and l_shipdate < '1992-07-01' in ldbs6,
l_shipdate >= '1992-07-01' and l_shipdate < '1992-08-01' in ldbs7,
l_shipdate >= '1992-08-01' and l_shipdate < '1992-09-01' in ldbs8,
l_shipdate >= '1992-09-01' and l_shipdate < '1992-10-01' in ldbs9,
l_shipdate >= '1992-10-01' and l_shipdate < '1992-11-01' in ldbs10,
l_shipdate >= '1992-11-01' and l_shipdate < '1992-12-01' in ldbs11,
l_shipdate >= '1992-12-01' and l_shipdate < '1993-01-01' in ldbs12,
l_shipdate >= '1993-01-01' and l_shipdate < '1993-02-01' in ldbs13,
l_shipdate >= '1993-02-01' and l_shipdate < '1993-03-01' in ldbs14,
l_shipdate >= '1993-03-01' and l_shipdate < '1993-04-01' in ldbs15,
l_shipdate >= '1993-04-01' and l_shipdate < '1993-05-01' in ldbs16,
l_shipdate >= '1993-05-01' and l_shipdate < '1993-06-01' in ldbs17,
l_shipdate >= '1993-06-01' and l_shipdate < '1993-07-01' in ldbs18,
l_shipdate >= '1993-07-01' and l_shipdate < '1993-08-01' in ldbs19,
l_shipdate >= '1993-08-01' and l_shipdate < '1993-09-01' in ldbs20,
l_shipdate >= '1993-09-01' and l_shipdate < '1993-10-01' in ldbs21,
l_shipdate >= '1993-10-01' and l_shipdate < '1993-11-01' in ldbs22,
l_shipdate >= '1993-11-01' and l_shipdate < '1993-12-01' in ldbs23,
l_shipdate >= '1993-12-01' and l_shipdate < '1994-01-01' in ldbs24,
l_shipdate >= '1994-01-01' in ldbs25;
Notice that the Informix syntax provides an open ended range at the top and bottom to catch dates that are not in the expected range. The Db2 syntax can be modified to match the Informix syntax by adding ranges that make use of MINVALUE and MAXVALUE.
CREATE TABLE orders
(
l_orderkey DECIMAL(10,0) NOT NULL,
l_partkey INTEGER,
l_suppkey INTEGER,
l_linenumber INTEGER,
l_quantity DECIMAL(12,2),
l_extendedprice DECIMAL(12,2),
l_discount DECIMAL(12,2),
l_tax DECIMAL(12,2),
l_returnflag CHAR(1),
l_linestatus CHAR(1),
l_shipdate DATE,
l_commitdate DATE,
l_receiptdate DATE,
l_shipinstruct CHAR(25),
l_shipmode CHAR(10),
l_comment VARCHAR(44)
) PARTITION BY RANGE(l_shipdate)
(STARTING MINVALUE,
STARTING '1/1/1992' ENDING '12/31/1993' EVERY 1 MONTH,
ENDING MAXVALUE);
This technique allows any date to be inserted into the table.
Partition by expression using generated columns
Although Db2 database does not directly support partitioning by expression, partitioning on a generated column is supported, making it possible to achieve the same result.
- The generated column is a real column that occupies physical disk space. Tables that make use of a generated column can be slightly larger.
- Altering the generated column expression for the column on which a partitioned table is partitioned is not supported. Attempting to do so will result in the message SQL0190. Adding a new data partition to a table that uses generated columns in the manner described in the next section generally requires you to alter the expression that defines the generated column. Altering the expression that defines a generated column is not currently supported.
- There are limitations on when you can apply data partition elimination when a table uses generated columns.
Examples
CREATE TABLE customer (
cust_id INT,
cust_prov CHAR(2))
FRAGMENT BY EXPRESSION
cust_prov = "AB" IN dbspace_ab
cust_prov = "BC" IN dbspace_bc
cust_prov = "MB" IN dbspace_mb
...
cust_prov = "YT" IN dbspace_yt
REMAINDER IN dbspace_remainder;
CREATE TABLE customer (
cust_id INT,
cust_prov CHAR(2),
cust_prov_gen GENERATED ALWAYS AS (CASE
WHEN cust_prov = 'AB' THEN 1
WHEN cust_prov = 'BC' THEN 2
WHEN cust_prov = 'MB' THEN 3
...
WHEN cust_prov = 'YT' THEN 13
ELSE 14 END))
IN tbspace_ab, tbspace_bc, tbspace_mb, .... tbspace_remainder
PARTITION BY RANGE (cust_prov_gen)
(STARTING 1 ENDING 14 EVERY 1);
Here the expressions within the case statement match the corresponding expressions in the FRAGMENT BY EXPRESSION clause. The case statement maps each original expression to a number, which is stored in the generated column (cust_prov_gen in this example). This column is a real column stored on disk, so the table could occupy slightly more space than would be necessary if Db2 supported partition by expression directly. This example uses the short form of the syntax. Therefore, the table spaces in which to place the data partitions must be listed in the IN clause of the CREATE TABLE statement. Using the long form of the syntax requires a separate IN clause for each data partition.