Before you start
About this series
If you are preparing to take the DB2 10.1 for Linux, UNIX, and Windows DBA certification exam (exam 611), you've come to the right place — a study hall of sorts. This series of eight DB2 certification preparation tutorials covers the major concepts you'll need to know for the test.
About this tutorial
This tutorial discusses the creation of DB2 databases, as well as the various methods used for placing and storing objects within a database. The focus is on partitioning, compression, and XML, which are all important performance and application development concepts you need to know to store and access data quickly and efficiently. This is the second in a series of eight tutorials that you can use to help you prepare for the DB2 10.1 for Linux, UNIX, and Windows DBA certification exam (exam 611). The material here primarily covers the objectives in Section 2 of the exam ("Physical design").
After completing this tutorial, you should be able to:
- Demonstrate the ability to create a database and manipulate various DB2
- Know how to convert existing database to automatic storage database.
- Use the Admin_move_table feature.
- Demonstrate knowledge of partitioning capabilities.
- Demonstrate knowledge of XML data objects.
- Demonstrate knowledge of compression.
- Demonstrate knowledge of new table features.
- Knowledge of Multi-temperature data feature.
To take the DB2 10.1 DBA exam 611, you must have already passed the DB2 10.1 Fundamentals exam 610. We recommend that you study the DB2 10.1 fundamentals certification exam 610 series before starting this series.
To help you understand some of the material presented here, you should be familiar with the following terms:
- Object— Anything in a database that can be created or manipulated with SQL (e.g., tables, views, indices, packages).
- Table— A logical structure that is used to present data as a collection of unordered rows with a fixed number of columns. Each column contains a set of values, each value of the same data type (or a subtype of the column's data type); the definitions of the columns make up the table structure, and the rows contain the actual table data.
- Record— The storage representation of a row in a table.
- Field— The storage representation of a column in a table.
- Value— A specific data item that can be found at each intersection of a row and column in a database table.
- Structured Query Language (SQL)— A standardized language used to define objects and manipulate data in a relational database. (For more about SQL, see the fourth tutorial in this series.)
- DB2 optimizer— A component of the SQL precompiler that chooses an access plan for a Data Manipulation Language (DML) SQL statement by modeling the execution cost of several alternative access plans and choosing the one with the minimal estimated cost.
You do not need a copy of DB2 to complete this tutorial. However, you will get more out of it if you download the free trial version of IBM DB2 10.1 to work along with this tutorial.
Creating a database
A system database directory file exists for each instance of the
database manager and contains one entry for each database
cataloged for this instance. Databases are implicitly cataloged when the
create database command is issued, and can also be
explicitly cataloged with
A local database directory file exists in each drive or path in which a database has been defined. This directory contains one entry for each database accessible from that location.
Creating a database
When you create a database, each of the following tasks is done for you:
- Setting up of all the system catalog tables needed by the database.
- Allocation of the database recovery log.
- Creation of the database configuration file and the default values set.
- Binding of the database utilities to the database.
The create database command
To create a database, use the command
database. You can optionally specify the following:
- Whether the database should be automatically configured
- One or more storage paths
- Drive or path on which to create the database
- An alias for the database in the system database directory
- Code set and territory
- Collating sequence
- Default page size
- default extent size of table spaces in the database
- Table space definitions for the CATALOG, TEMPORARY, and USERSPACE1 table spaces
The default database
Each new database has:
- A default buffer pool defined (IBMDEFAULTBP).
- A default storage group named IBMSTOGROUP, which is automatically created while creating AUTOMATIC STORAGE database and the database storage paths are automatically assigned to that storage group.
- Three default table spaces:
- SYSCATSPACE for the system catalog tables. SYSCATSPACE cannot be dropped.
- TEMPSPACE1 for system-created temporary tables. The TEMPSPACE1 table space can be dropped once another temporary table space has been created.
- USERSPACE1 is the default table space for user-created objects. The USERSPACE1 table space can be dropped once another user-created table space has been created.
The system catalogs
A set of system catalog tables is created and maintained for each database. These tables contain information about the definitions of the database objects (tables, views, indices, and packages, for example) and security information about the type of access users have to these objects. These tables are stored in the SYSCATSPACE table space.
The directory structure
create database command lets you specify the drive
or directory on which to create the database, depending on the operating
system. If no drive or directory is specified, the database is created on the path
specified by the DFTDBPATH instance (database manager) configuration
parameter. If no drive or directory is specified, and the DFTDBPATH
instance-level configuration parameter is not set, the database is created on the drive or path
create database command was executed.
create database command creates a series of
subdirectories. The first subdirectory is named after the instance owner for the
instance in which the database was created. Under this subdirectory, DB2 creates
a directory that indicates which database partition the database was created
For a non-partitioned database, the directory will be NODE0000. For a partitioned database, the directory will be named NODExxxx, where xxxx will be the four-digit partition number for the database instance as designated in the db2nodes.cfg file. For example, for partition number 43, the directory would be NODE0043.
In Windows®, instances do not really have an instance owner, so the name of the instance (DB2, for example) will be used in place of the instance owner's ID.
Because more than one database can be created on the same drive or directory, each database must have its own unique subdirectory. Under the NODExxxx directory, there will be an SQLxxxxx directory for every database created on the drive or directory. For example, imagine we have three databases, DB2QS, TEST and SAMPLE, that were all created on the C drive on Windows. There will be three directories: SQL00001, SQL00002, and SQL00003.
To determine the directory under which the database was created, enter the
list database directory on C:. This will
produce output similar to the following.
Figure 1. Directory under which database is created
In the example above, the database DB2QS was created in the SQL00001 directory, the database TEST was created in the SQL00002 and the database SAMPLE was created in the SQL00003 directory under the NODExxxx directory.
For AUTOMATIC STORAGE databases:
- The system catalog table space (SYSCATSPACE) will use the directory T0000000.
- The system temporary table space (TEMPSPACE1) will use the directory T0000001.
- The default user table space (USERSPACE1) will use the directory T0000002.
EXAMPLE: The following command on windows:
db2 create db AUTODB2 on D:\DB2\Data dbpath on D:\DB2 will create a
folder structure as specified in the next figure.
Figure 2. Folder structure created after executing create db command
Create database commands for Linux/UNIX
To create a database on the directory (file system)/database, use the following
create database sample on
/database. If this command were executed in the instance named dbinst, on the server where
database partition 0 is defined, the following directory structures would be
Create database commands for Windows
To create a database on the D: drive, use the following command:
create database sample on D:. If this command were executed in the instance named dbinst, on the server where
database partition 0 is defined, the following directory structures would be
Creating the USERSPACE1 table space as DMS
To create a database and define the USERSPACE1 table space to be database managed space (DMS), using two file containers, use the following command on Linux or UNIX:
create database sample2 user table space managed by database using(file '/dbfiles/cont0' 5000, file '/dbfiles/cont1' 5000)
create database sample2 user table space managed by database using(file 'c:\dbfiles\cont0' 5000, file 'c:\dbfiles\cont1' 5000)
Creating the TEMPSPACE1 table space with user-defined containers
To create a database and define the TEMPSPACE1 table space to use two SMS containers, use the following commands on Linux or UNIX:
create database sample3 temporary tablespace managed by system using('/dbfiles/cont0', '/dbfiles/cont1')
create database sample3 temporary tablespace managed by system using('c:\dbfiles\cont0', 'c:\dbfiles\cont1')
Changing the collating sequence for the database
The Linux/UNIX command
SAMPLE on /mydbs collate using identity or Windows command
database SAMPLE on D: collate using identity creates a database and compares strings byte for byte, since the collating
sequence has been set to identity.
Creating and manipulating various DB2 objects
This section discusses the purpose and use of the following DB2 objects:
- Data placing objects
- Buffer pools
- Table spaces
- Application objects
- Identity columns
The database buffer pool area is a piece of memory used to cache a table's index and data pages as they are being read from disk to be scanned or modified. The buffer pool area helps improve database system performance by allowing data to be accessed from memory instead of from disk. Because memory access is much faster than disk access, the less often that DB2 needs to read from or write to a disk, the better the system will perform.
When a database is created, one default buffer pool is created for the database.
This buffer pool, IBMDEFAULTBP, will have a page size equal to the database
page size as specified at the database creation time. If no default page size
was specified at database creation, the default buffer pool will have a page
size of 4 KB. The default buffer pool size depends on the operating system. For
Windows, the default buffer pool is 250 pages or 1 MB; for UNIX, it's
1,000 pages or 4 MB. The default buffer pool cannot be dropped,
but its size can be changed using the
Creating a buffer pool
create bufferpool command has options to specify the following:
Buffer pool namespecifies the name of the buffer pool. The name cannot be used for any other buffer pools and cannot begin with the characters SYS or IBM.
immediatespecifies that the buffer pool will be created immediately if there is enough memory available on the system. If there is not enough reserved space in the database shared memory to allocate the new buffer pool, a warning is returned, and buffer pool creation will be DEFERRED, as described below.
deferredspecifies that the buffer pool will be created the next time that the database is stopped and restarted.
all dbpartitionnumsspecifies that the buffer pool will be created on all partitions in the database. This is the default if no database partition group is specified.
database partition groupspecifies the database partition groups on which the buffer pool will be created. The buffer pool will be created on all database partitions that are part of the specified database partition groups.
sizespecifies the size of the buffer pool and is defined in number of pages. In a partitioned database, this will be the default size for all database partitions where the buffer pool exists.
numblockpagesspecifies the number of pages to be created in the block-based area of the buffer pool. The actual value of numblockpages may differ from what was specified because the size must be a multiple of the block size. The block-based area of the buffer pool cannot be more than 98 percent of the total buffer pool size. Specifying a value of 0 will disable block I/O for the buffer pool.
blocksizespecifies the number of pages within a given block in the block-based area of the buffer pool. The block size must be between 2 and 256 pages; the default value is 32 pages.
pagesizespecifies the page size used for the buffer pool. The default page size is 4 KB or 4,096 bytes. The page size can be specified in bytes or kilobytes.
Once a page size and name for a buffer pool have been defined, they cannot be altered.
create bufferpool statement
The following statement creates a buffer pool named BP1 with a size of 100 MB
(25,000 4-KB pages). Because the page size is not specified, the buffer pool
uses the default page size of 4 KB. Since the IMMEDIATE option is the default,
the buffer pool is allocated immediately and available for use as long as there
is enough memory available to fulfill the request:
create bufferpool BP1 size 25000.
The following statement creates a buffer pool named BP2 with a size of 200 MB
(25,000 8-KB pages). The buffer pool uses an 8-KB page size. Since the
immediate option is the default, the buffer pool is
allocated immediately and available for use as long as there is enough memory
available to fulfill the request:
bufferpool BP2 size 25000 pagesize 8 K.
The following statement creates a buffer pool named BP3 with a size of 4 GB
(1 million 4-KB pages). Because the page size is not specified, the buffer pool
uses the default page size of 4 KB; 128 pages of the buffer pool will be used in
the block-based area and blocks of the buffer pool will have 64 pages each.
deferred option is specified, the buffer
pool is not allocated until the database is stopped and restarted:
create bufferpool BP3 deferred size 1000000
NUMBLOCKPAGES 128 BLOCKSIZE 64.
A table space is a storage structure containing tables, indices, large objects, and long data. They are used to organize data in a database into logical storage groupings that relate to where data is stored on a system. Table spaces are stored in database partition groups.
What is a schema?
A schema is a high-level qualifier for database objects created within a database. It is a collection of database objects such as tables, views, indices, or triggers. It provides a logical classification of database objects.
While you are organizing your data into tables, it may also be beneficial to
group tables and other related objects. This is done by defining a
schema using the
create schema command. Information
about the schema is kept in the system catalog tables of the database to which
you are connected. As other objects are created, they can be placed within this
A set of system schemas is created with every database and placed into the SYSCATSPACE table space:
- SYSIBM— The base system catalogs; direct access is not recommended.
- SYSCAT— SELECT authority granted to PUBLIC on this schema; catalog read-only views; recommended way to obtain catalog information.
- SYSSTAT— Updatable catalog views; influences the optimizer.
- SYSFUN— User-defined functions.
How is a schema used in DB2?
Use a schema to fully qualify a table or other object name:
You can have multiple tables with the same name, but different schema names. Thus, the table user1.staff is not the same as user2.staff. As a result, you can use schemas to create logical databases within a DB2 database.
To create a schema, use the
create schema command.
Who can use a schema?
When you can create a schema, you can specify the owner of the schema using
authorization. If you do not, the
authorization ID that executed the
statement will be the owner of the schema. Privileges on the schema can also be
granted to users or groups at the same time.
Once a schema exists, the owner of the schema can grant CREATE_IN privilege on the schema to other users or groups.
Specifying the schema when creating an object
The schema name for an object can be explicitly specified as follows:
create table DWAINE.table1 (c1 int, c2 int).
If the user DWAINE connects to the database SAMPLE and issues the
create table t2 (c1
int), the schema DWAINE is created (as long as IMPLICT_SCHEMA has not been revoked from
the user DWAINE), as well as the table in the database.
The ID used to connect to the database is known as the authorization ID.
Specifying the schema when using DML commands
When using DML commands (for example,
delete) on database
- The object schema can be explicitly specified on the object name, such as schema1.table1.
- The object schema can be specified using the
set current schemaor
set current sqlidcommands.
- If no object schema is explicitly specified, the schema will be set to the current authorization ID.
For example, if user DWAINE connects to the database SAMPLE and issues
select * from t2, DWAINE.T2
is selected from the table, if the table exists. Otherwise, an error is
To create a table in a database, you must first be connected to the database. You must also have SYSADM authority in the instance, or DBADM authority or createtab privilege in the database.
When creating a table, you can specify the following:
- Table name
- Column definitions
- Primary/foreign keys
- Table space for the data, index, and long objects
The listing below shows an example:
Listing 1. Example of table creation
CREATE TABLE EMPLOYEE_SALARY ( DEPTNO CHAR(3) NOT NULL, DEPTNAME VARCHAR(36) NOT NULL, EMPNO CHAR(6) NOT NULL, SALARY DECIMAL(9,2) NOT NULL WITH DEFAULT) INDEX IN indtbsp IN datatbsp
Where are tables created?
If a table is created without the
in clause, the table
data (and its indices and LOB data) is placed in the following order:
- In the IBMDEFAULTGROUP table space, if it exists, and if the page size is sufficient.
- In a user-created table space, which is of the smallest page size sufficient for the table.
- In USERSPACE1, if it exists and has a sufficient page size.
LONG IN clauses specify the table spaces in which the
regular table data, index, and long objects are to be stored. Note that this
only applies to DMS table spaces.
Obtaining table information
You can obtain table information with the commands in the table below.
Table 1. Obtaining table information
|List tables for the current user|
|List all tables defined in the database|
|List tables for the specified schema|
|Show the structure of the specified table|
For example, the command
department produces the following output.
Figure 3. Table information output
An index can:
- Be ascending or descending (the default, if not specified, is ascending).
- Be unique or non-unique (the default, if not specified, is non-unique).
- Be compound.
- Be used to enforce clustering.
- Be bi-directional; this is controlled by allow or disallow reverse scans.
- Include additional columns; this is only applicable for unique indices.
Following are a number of create unique statements that illustrate these options:
create unique index itemno on albums (itemno) desc create index clx1 on stock (shipdate) cluster allow reverse scans create unique index incidx on stock (itemno) include (itemname) create index item on stock (itemno) disallow reverse scans collect detailed statistics
An identity column is a numeric column in a table that causes DB2 to automatically generate a unique numeric value for each row inserted into the table. A table can have a maximum of one identity column. The values for the column can be generated by DB2 always or by default:
- If values are always generated, the DB2 database always generates them, and applications are not allowed to provide an explicit value.
- If values are generated by default, the values can be explicitly provided by an application; DB2 generates a value only if the application does not provide it. Thus, DB2 cannot guarantee the uniqueness of the values. This option is intended for data propagation, or loading and unloading of a table.
Let's look at an example. Given the table created using the following command.
create table inventory (partno INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100 INCREMENT BY 1), description CHAR(20) )
HINT: If you plan to load or import data into your table, use
GENERATED BY DEFAULT IDENTITY, which will make use
of the supplied values if they are provided; if the data is missing or
explicitly NULL, a value is generated.
Table 2. Identity column statement and values
|insert into inventory VALUES (DEFAULT,'door')||inserts 100,door|
|insert into inventory (description) VALUES ('hinge')||inserts 101,hinge|
|insert into inventory VALUES (200,'windor')||error|
|insert into inventory (description) VALUES ('frame')||inserts 102,frame|
Then the statement
SELECT * FROM inventory
100 door 101 hinge 102 frame.
Views are derived from one or more base tables, nicknames, or views, and can be used interchangeably with base tables when retrieving data. All views can be used just like tables for data retrieval. Whether a view can be used in an insert, update, or delete operation depends on its definition. Views are classified by the operations they allow:
A view can be created to limit access to sensitive data, while allowing more general access to other data. The data for a view is not stored separately from the table. In other words, a view uses no space in the database, other than its definition in the system catalogs. The creator of a view needs to have at least SELECT privilege on the base tables referenced in the view definition.
The information about all existing views is stored in:
create view statements show how views
create view DEPTSALARY AS SELECT DEPTNO, DEPTNAME, SUM(SALARY) AS TOTALS FROM PAYROLL GROUP BY DEPTNO,DEPTNAME create view EMPSALARY AS SELECT EMPNO, EMPNAME, SALARY FROM PAYROLL, PERSONNEL WHERE EMPNO=EMPNUMB
The with check option
with check option specifies the constraint that
every row inserted or updated through a view must conform to the
definition of the view. A row that does not conform to the definition of the
view is a row that does not satisfy the search conditions of the view.
For example, consider this command:
create view emp_view2 (empno, empname, deptno) AS (SELECT id, name, dept FROM employee WHERE dept = 10)with check option
When this view is used to insert or update with new values, the
with check option restricts the input values for the
There are a number of types of constraints in DB2:
- Referential integrity constraints
- Unique constraints
- Check constraints
- Informational constraints
You cannot directly modify a constraint; you must instead drop it and create a new constraint with the characteristics you want. Each constraint is discussed below.
Referential integrity constraints
Referential integrity constraints are defined when a table is created or
subsequently using the
statement. The clauses that establish referential integrity are:
- The primary key clause
- The unique constraint clause
- The foreign key clause
- The references clause
create table artists (artno INT, ... primary key (artno) foreign key dept (workdept) references department on delete no action)
Let's take a look at the various referential integrity rules:
- There is an implicit rule to back out of an insert if a parent is not found.
- Restrict: A parent row is not deleted if dependent rows are found.
- Cascade: Deleting a row in a parent table automatically deletes any related rows in a dependent table.
- No Action (the default): Enforces presence of parent row for every child after all other referential constraints are applied.
- Set Null: Foreign key fields set to null; other columns left unchanged.
- Restrict: An update for a parent key will be rejected if a row in a dependent table matches the original values of key.
- No Action (the default): An update will be rejected for parent key if there is no matching row in the dependent table.
A unique constraint can be used as the primary key for a foreign key constraint, just like an explicitly declared primary key. This allows referential integrity constraints to be placed on different columns within the same table. A unique constraint forces the values in the column to be unique; the column cannot contain null values.
A check constraint is used to enforce data integrity at the table level. It forces values in the table to conform to the constraint. All subsequent inserts and updates must conform to the defined constraints on the table or the statement will fail. If existing rows in the table do not meet the constraint, the constraint cannot be defined. Constraint checking can be turned off to speed up the addition of a large amount of data, but the table will be placed in CHECK PENDING state.
Informational constraints are rules that can be used by the optimizer, but are not enforced during runtime. Because other constraints may result in the overhead for insert, update, or delete operations, informational constraints may be a better alternative if the application already verifies the data. Informational constraints can be:
- ENFORCED — The constraint is enforced by the database manager during normal operations such as insert, update, or delete.
- NOT ENFORCED: When used, DB2 may return wrong results when any data in the table violates the constraint.
- ENABLE QUERY OPTIMIZATION: The constraint can be used for query optimization under appropriate circumstances.
- DISABLE QUERY OPTIMIZATION: The constraint cannot be used for query optimization.
A trigger defines a set of actions that are activated, or triggered, by an action on a specified base table. The actions triggered may cause other changes to the database or raise an exception. A trigger can be fired before or after inserts, updates, or deletes. Triggers are used for:
- Validation, similar to constraints but more flexible.
- Conditioning, allowing new data to be modified or conditioned to a predefined value.
- Integrity, similar to referential integrity but more flexible.
A table space is a storage structure containing tables, indices, large objects, and long data. They are used to organize data in a database into logical storage groupings that relate to where data is stored on a system. Table spaces are stored in database partition groups.
Table spaces consist of one or more containers. A container can be a directory name, a device name, or a file name. A single table space can have several containers. It is possible for multiple containers (from one or more table spaces) to be created on the same physical storage device (although you will get the best performance if each container you create uses a different storage device).
Figure 4. Table spaces and tables in a database
If you are using automatic storage table spaces, the creation and management of
containers is handled automatically by the database manager. If you are not
using automatic storage table spaces, you must define and manage containers
yourself. To get details about the table spaces in a database, use the following
get snapshot for tablespaces or
SMS table spaces
System Managed Space (SMS) table spaces use the file system manager provided by the operating system to allocate and manage the space where the tables are stored. Within an SMS table space, each container is an operating system directory, and table objects are created as files within that directory. When creating an SMS table space, the user must specify the name of the directory for each of the containers. DB2 will create the tables within the directories used in the table space by using unique file names for each object.
If a table space is created with more than one container, DB2 will balance the amount of data written to the containers. Since containers cannot be dynamically added to an SMS table space once it has been created, it is important to know the size requirements of the table space and create all required containers when the table space is created.
Characteristics of SMS table spaces
With SMS table spaces:
- All table data and indices share the same table space.
- Each table in a table space is given its own file name used by all containers. The file extension denotes the type of the data stored in the file.
- There is the possibility for dynamic file growth, with an upper boundary on size governed by the number of containers, OS limits on the size of the file system, and OS limits on size of individual files.
- When all space in a single container is allocated, the table space is considered full even if space remains in other containers.
- New containers can only be added to SMS on a partition that does not yet have any containers.
- On Linux or UNIX, the file system size may be increased.
SMS table spaces are very easy to administer, and are recommended for the TEMP table space.
Creating SMS table spaces
To create an SMS table space, use the following command:
create table space TS1 managed by system using ('path1', 'path2',
'path3'). When the path is specified for an SMS container, it can be an absolute
path or a relative path to the directory. If the directory does not exist, DB2
will create it. If the directory does exist, it cannot contain any files or
subdirectories. For example,
create table space
ts1 managed by system using ('D:\DIR1') specifies the absolute path to the directory. DB2 would create the DIR1 directory
on the D: drive on the database server if it does not already exist.
create table space ts2 managed by system
using ('DIR1') specifies the relative path DIR1. DB2 would create the DIR1 directory under the
database home directory.
The following SQL statements create an SMS table space with three containers on three separate drives or file systems. Note that the table space name is the same, as the examples are showing the differences between the Linux/UNIX and Windows table space definitions:
create tablespace smstbspc managed by system using ('d:\tbspc1', 'e:\tbspc2', 'f:\ tbspc3') create tablespace smstbspc managed by system using ('/dbase/container1', '/dbase/container2', '/dbase/container3')
Altering SMS table spaces
SMS table spaces can only be altered to change the prefetch size. Containers
cannot be added to an SMS table space using the
alter command. However, containers can be redefined, added, or removed during a redirected
DMS table spaces
In a database managed space (DMS) table space, the database manager controls the storage space. Unlike SMS table spaces, storage space is pre-allocated on the file system based on container definitions you specify when you create the DMS table space.
The DMS storage model consists of a limited number of files or devices where space is managed by the database manager. You decide which files and devices to use when creating containers, and you manage the space for those files and devices.
A DMS table space containing user defined tables and data can be defined as a large (the default) or regular table space that stores any table data or index data.
Characteristics of DMS table spaces
With DMS table spaces:
- The database manager uses striping to ensure an even distribution of data across all containers.
- Containers that make up a DMS table space are not required to be the same size. If any container is full, DMS table spaces use available free space from other containers.
- Because space is pre-allocated, it must be available before the table space can be created.
- Enabling DMS table spaces that use file containers for automatic resizing allows the database manager to handle the full table space condition automatically by extending existing containers for you.
- The table space is considered to be full when all of the space within the containers has been used.
- Unlike for SMS table spaces, you can add or extend containers manually,
ALTER TABLESPACEstatement, allowing more storage space to be given to the table space.
Creating DMS table spaces
The following statement creates a DMS table space without enabling auto-resize (the default):
CREATE TABLESPACE DMS1 MANAGED BY DATABASE USING FILE '/db2files/DMS1' 10 M)
To enable the auto-resize feature, specify the
YES clause for the
CREATE TABLESPACE DMS1 MANAGED BY DATABASE USING (FILE '/db2files/DMS1' 10 M) AUTORESIZE YES
To create a table space that can grow to 100 MB (per database partition
if the database has multiple database partitions), use the
MAXSIZE clause of the
CREATE TABLESPACE DMS1 MANAGED BY DATABASE USING (FILE '/db2files/DMS1' 10 M) AUTORESIZE YES MAXSIZE 100 M
If you do not specify the
MAXSIZE clause, there is no
maximum limit when the auto-resize feature is enabled. The table space will grow
until a file system limit is reached.
INCREASESIZE clause of the
CREATE TABLESPACE statement defines the amount of space used to
increase the table space when there are no free extents within the
table space. You can specify the value as an explicit size or as a percentage, as shown in
the following examples:
Listing 2. Specify value as explicit size or percentage
CREATE TABLESPACE DMS1 MANAGED BY DATABASE USING (FILE '/db2files/DMS1' 10 M) AUTORESIZE YES INCREASESIZE 5 M CREATE TABLESPACE DMS1 MANAGED BY DATABASE USING (FILE '/db2files/DMS1' 10 M) AUTORESIZE YES INCREASESIZE 50 PERCENT
Altering DMS table spaces
You can enable or disable the auto-resize feature after creating a DMS
table space by using
ALTER TABLESPACE statement with
ALTER TABLESPACE DMS1 AUTORESIZE YES ALTER TABLESPACE DMS1 AUTORESIZE NO
ALTER TABLESPACE statement to change the value
MAXSIZE for a table space that has auto-resize
already enabled, as shown in the following examples:
ALTER TABLESPACE DMS1 MAXSIZE 1 G ALTER TABLESPACE DMS1 MAXSIZE NONE
You can also define the amount of space used to increase the table space when there are no free extents as shown in the following examples:
ALTER TABLESPACE DMS1 INCREASESIZE 5 M ALTER TABLESPACE DMS1 INCREASESIZE 50 PERCENT
What is a automatic storage?
Automatic storage allows you to specify one or more storage paths for a database. Then when you create table space, a container file will be created automatically on each storage path by DB2. You can enable or configure automatic storage for a database when it is created, as follows:
db2 create database db_name automatic storage yes db2 create database db_name on db_path1, db_path2
You can add additional storage paths to a database set up for automatic storage
add storage parameter:
db2 alter database db_name add storage on db_path3. You can also drop
a storage path from a database set up for automatic storage
drop storage parameter:
db2 alter database db_name drop storage on db_path3.
Using automatic storage
Once your database has been set up for automatic storage, you can create table
spaces using this mechanism. You have several ways to take advantage of
automatic storage once the database has been set up that way. You can simply
create a table space in the database (once you are connected to the
db2 create tablespace ts_name.
Or you can create a table space and specify its initial size and growth
db2 create tablespace ts_name initialsize 10M increasesize 10M maxsize 100M
In this example, the table space will start out at 10 MB, and as it gets close to being full, DB2 will automatically extend it by 10 MB at a time, up to its maximum size of 100 MB.
If the database was not set up for automatic storage, you can still use automatic storage for a table space if you create it and specify its storage:
db2 create tablespace ts_name managed by automatic storage
A storage group is a named set of storage paths where data can be stored. Storage groups are configured to represent different classes of storage available to your database system. You can assign table spaces to the storage group that best suits the data. Only automatic storage table spaces use storage groups.
A table space can be associated with only one storage group, but a storage group
can have multiple table space associations. To manage storage group
objects, you can use the
With the table partitioning feature, you can place table data in multiple table spaces. Using this feature, storage groups can store a subset of table data on fast storage while the remainder of the data is on one or more layers of slower storage. Use storage groups to support multi-temperature storage, which prioritizes data based on classes of storage. For example, you can create storage groups that map to the different tiers of storage in your database system. Then the defined table spaces are associated with these storage groups. More information about multi-temperature storage is available in the "Multi-temperature data feature" section.
The default storage groups
When you create a database, a default storage group named
IBMSTOGROUP is automatically created. However, a database
created with the
AUTOMATIC STORAGE NO clause does not have a default storage
group. The first storage group created with the
STOGROUP statement becomes the designated default storage group.
There can only be one storage group designated as the default storage group.
If a database has storage groups, the default storage group is used when an automatic storage managed table space is created without explicitly specifying the storage group.
You can designate a default storage group by using either the
CREATE STOGROUP or
statement. When you designate a different storage group as the default storage
group, there is no impact to the existing table spaces using the old default
storage group. To alter the storage group associated with a table space, use the
ALTER TABLESPACE statement. You can
determine which storage group is the default storage group by using the
SYSCAT.STOGROUPS catalog view.
You cannot drop the current default storage group. You can drop the IBMSTOGROUP storage group if it is not designated as the default storage group at that time. If you drop the IBMSTOGROUP storage group, you can create another storage group with that name.
Creating storage groups
To create a storage group by using the command line, enter
CREATE STOGROUP operational_sg ON '/filesystem1', '/filesystem2',
'/filesystem3'... where operational_sg is the name of the storage group
and /filesystem1, /filesystem2, e/filesystem3 , ... are the storage paths to be added.
Altering storage groups
You can use the
ALTER STOGROUP statement to alter the
definition of a storage group, including setting media attributes, setting a
data tag, or setting a default storage group. You can also add and remove
storage paths from a storage group.
To add storage paths '/hdd/path1' and '/hdd/path2' to storage group
sg, issue the following
ALTER STOGROUP sg
ADD '/hdd/path1', '/hdd/path2'. To drop storage paths '/db2/filesystem1'
and e'/db2/filesystem2' from storage group
sg, issue the following
ALTER STOGROUP sg DROP '/db2/filesystem1',
How to convert existing database to automatic storage
You can convert an existing non-automatic storage database to use automatic
storage by using the
CREATE STOGROUP statement to
define the default storage group within a database.
When you define a storage group for a database, existing table spaces are not
automatically converted to use automatic storage. By default, only future table
spaces you create are automatic storage table spaces. You must use the
ALTER TABLESPACE statement to convert existing
table spaces to use automatic storage.
Example 1: Converting a database on Linux or UNIX
Assume that a database is a non-automatic storage database and that
/data1/as and /data2/as are the paths you want to use for
automatic storage table spaces. To convert the database to an automatic storage
database, create a storage group with /data1/as and
/data2/as as paths:
CREATE STOGROUP sg ON
Example 2: Converting a database on Windows operating systems
Assume that a database is a nonautomatic storage database, and that
F:\DB2DATA and G: are the paths you want to use for automatic
storage table spaces. To convert the database to an automatic storage database,
create a storage group with F:\DB2DATA and G: as paths:
CREATE STOGROUP sg ON 'F:\DB2DATA', 'G:'. After creating the default
storage group, to convert an existing DMS table space tbspc1 to use
automatic storage, issue the following statements:
ALTER TABLESPACE tbspc1 MANAGED BY AUTOMATIC STORAGE USING STOGROUP sg ALTER TABLESPACE tbspc1 REBALANCE
The rebalance operation moves data from the non-automatic storage containers to the new automatic storage containers.
How to check if an existing database is an automatic storage
There are several ways to check if an existing database is an automatic storage database. You can use any of the following options.
ADMIN_GET_STORAGE_PATHStable function — It returns a list of automatic storage paths for each database storage group, including file system information for each storage path.
SYSIBMADM.SNAPSTORAGE_PATHSadministrative view — It returns a list of automatic storage paths for the database including file system information for each storage path.
db2lookutility — Generates the DDL statements for a database by object type.
Using Admin_move_table feature
You can move tables online and offline using the
In fact, this can be used to move the data in a table to a new table
object of the same name (but with possibly different storage characteristics
such as a different table space) while the data remains online and available for
access. You can also generate a new optimal compression dictionary when a table
ADMIN_MOVE_TABLE stored procedure creates a protocol table comprising rows
with status information and configuration options related to the table to
be moved. The return set from this procedure is the rows from that protocol
table related to the table to be moved.
You can invoke the
ADMIN_MOVE_TABLE in one of two ways:
Method 1— Modify only certain parts of the table definition for the target table. Fill out the data_tbsp, index_tbsp, and lob_tbsp parameters while calling the procedure, leaving the other optional parameters blank.
This example calls the stored procedure using the first method, where the target table is defined within the procedure, to move a table named T1, located in the schema SCHEMA1. Additionally, the column definitions of the target table are passed to the procedure.
Listing 3. Example for the stored procedure using the first method
CALL SYSPROC.ADMIN_MOVE_TABLE( 'SCHEMA1', 'T1', 'ACCOUNTING', 'ACCOUNT_IDX', 'ACCOUNT_LONG', '', '', '', 'CUSTOMER VARCHAR(80), REGION CHAR(5), YEAR INTEGER, CONTENTS CLOB', '', 'MOVE')
The above example moves the table SCHEMA1.T1 to a new table with the same name that has the column definitions ('CUSTOMER VARCHAR(80), REGION CHAR(5), YEAR INTEGER, CONTENTS CLOB') and resides in table space 'ACCOUNTING' with its indices table space 'ACCOUNT_IDX' and its LOBs table space 'ACCOUNT_LONG'.
Method 2— create the target table and provide its name to the procedure. This provides you with more control and flexibility by allowing you to create the target table beforehand, rather than having the stored procedure create it.
This example is equivalent to the previous one, but it calls the stored procedure using the second method, where the target table is created outside the procedure and is named within the target_tabname parameter, to move the same table as in the previous example.
Listing 4. Example for the stored procedure using the second method
CREATE TABLE SCHEMA1.T1_TGT ( CUSTOMER VARCHAR(80), REGION CHAR(5), YEAR INTEGER, CONTENTS CLOB) IN ACCOUNTING INDEX IN ACCOUNT_IDX LONG IN ACCOUNT_LONG'
CALL SYSPROC.ADMIN_MOVE_TABLE( 'SCHEMA', 'T1', 'T1_TGT', '', 'MOVE')
For online data movement:
- The procedure creates a shadow table to which the data are copied.
- Any changes to the source table during the copy phase are captured using triggers and placed in a staging table.
- After the copy phase is completed, the changes captured in the staging table are replayed to the shadow copy.
- Following that, the stored procedure briefly takes the source table offline and assigns the source table name and index names to the shadow copy and its indices.
- The shadow table is then brought online, replacing the source table. By
default, the source table is dropped, but you can use the
KEEPoption to retain it under a different name.
Obviously, the online operation costs more server resources (disk space and processing power), so make sure you only use it if you value availability more than cost, space, move performance, and transaction overhead. Additionally, avoid performing online moves for tables without indices, particularly unique ones as it might result in deadlocks and complex or expensive replay.
Handling online move failure
If the online move fails, rerun it:
- Fix the problem that caused the table move to fail.
- Determine the stage that was in progress when the table move failed by querying the SYSTOOLS.ADMIN_MOVE_TABLE protocol table for the status.
- Call the stored procedure again, specifying the applicable option:
- If the status of the procedure is INIT, use the
- If the status of the procedure is COPY, use the
- If the status of the procedure is REPLAY, use the
- If the status of the procedure is CLEANUP, use the
- If the status of the procedure is INIT, use the
You can cancel the move by specifying the CANCEL option for the stored procedure if the status of an online table move is not COMPLETED or CLEANUP.
Table space states
Determining a table space's state
To find the state for the table spaces in a database:
list tablespaces show detail.
A table space can have a number of different states, as shown below.
Table 3. Table space states
|Roll-forward in progress||0x40|
|Reorg in Progress||0x400|
|Backup in Progress||0x800|
|Storage Must be Defined||0x1000|
|Restore in Progress||0x2000|
|Offline and Not Accessible||0x4000|
|Load in Progress||0x20000|
|Storage May be Defined||0x2000000|
|DMS Rebalance in Progress||0x10000000|
|Table Space Deletion in Progress||0x20000000|
|Table Space Creation in Progress||0x40000000|
More details about each of the above states are available at Information Center.
Data organization and partitioning
Table (range) partitioning
Several technologies in DB2 allow you to break up your data into smaller "chunks" for greater parallelism for queries, allow for partition elimination in queries, and in general help improve performance. One of these technologies is the Table (or Range) Partitioning feature, this feature allows you to take a single table and divide it into partitions which may be spread across multiple table spaces.
This partitioning capability has a number of benefits, including:
- Simplified syntax for creating table or range partitions.
- Large tables can be spread across several table spaces. Note that all of the table spaces specified must have the same page size, extent size, storage mechanism (DMS or SMS), and type (REGULAR or LARGE), and all of the table spaces must be in the same database partition group.
- Simplified process for rolling in and rolling out of table data by seamlessly attaching and detaching table partitions.
- Improved query performance by automatically eliminating data partitions based on predicates in the query.
Below is the simplified syntax for creating a partitioned table to store 24 month's worth of data in 24 partitions spread across four table spaces.
Listing 5. Syntax for creating a partitioned table
CREATE TABLE fact (txn_id char(7), purchase_date date, ...) IN tbsp1, tbsp2, tbsp3, tbsp4 PARTITION BY RANGE (purchase_date) ( STARTING FROM ('2005-01-01') ENDING ('2006-12-31') EVERY 1 MONTH )
Table partitioning syntax
The table creation syntax has a short and a long form. Before diving into the syntax, let's first talk about the partitioning column. Table partitioning allows for the specification of ranges of data, where each range goes into a separate partition. The partitioning column, or columns, defines the ranges. A partitioning column can be any of the DB2 base data types except for LOBs and LONG VARCHAR columns. You can also specify multiple columns (watch for an example later), and you can specify generated columns to simulate partitioning on a function. Here is the short form.
Listing 6. Short form for table partitioning syntax
CREATE TABLE t1(c1 INT) IN tbsp1, tbsp2, tbsp3 PARTITION BY RANGE(c1) (STARTING FROM (0) ENDING (80) EVERY (10))
Here is the same result using the long form.
Listing 7. Long form for table partitioning syntax
CREATE TABLE t1(c1 INT) PARTITION BY RANGE(c1) (STARTING FROM (0) ENDING (10) IN tbsp1, ENDING (20) IN tbsp2, ENDING (30) IN tbsp3, ENDING (40) IN tbsp1, ENDING (50) IN tbsp2, ENDING (60) IN tbsp3, ENDING (70) IN tbsp1, ENDING (80) IN tbsp2)
Quickly adding or removing data ranges
Another benefit is that when you detach a partition you get a separate table that contains the contents of that partition. You can detach a partition from a table and then do something with that newly detached partition, which is now a physical table. For example, you could archive off that table, move it to tertiary storage, copy it to another location, or whatever you want to do. DB2 10.1 will asynchronously clean up any index keys on that partitioned table without impacting running applications.
In fact, detaching a table partition from a partitioned table is invoked using
ALTER TABLE..DETACH PARTITION..INTO
clause and consists of two phases:
- The partition is logically detached from the partitioned table.
- Asynchronously the logically detached partitioned is converted into a stand-alone table.
This implies that detached data becomes invisible instantly and no actual data
movement occurs during the detach process. A typical example of
detaching a table partition:
part_table DETACH PARTITION p0 INTO TABLE p0_tab;.
In the above statement, the partition p0 is detached from the part_table table, and asynchronously a new physical table p0_tab is created containing the content of the partition p0.
Note that the following conditions must be met before you can perform a
- The table to be detached from (source table) must exist and be a partitioned table.
- The data partition to be detached must exist in the source table.
- The source table must have more than one data partition. A partitioned table
must have at least one data partition. Only visible and attached data
partitions pertain in this context. An attached data partition is a data
partition that is attached but not yet validated by the
- The name of the table to be created by the
DETACH PARTITIONoperation (target table) must not exist.
DETACH PARTITIONis not allowed on a table that is the parent of an enforced referential integrity (RI) relationship. So you must change the RI constraints into informational by setting them to
NOT ENFORCEDbefore you can detach a partition. Then you can enforce the constraint again after the detaching process ends.
- If there are any detached dependent tables (tables need to be incrementally
maintained with respect to the detached data partition), the
SET INTEGRITYstatement is required to be run on these tables to incrementally maintain the tables. With DB2 9.7 Fix Pack 1 or later, after the
SET INTEGRITYstatement is run on all detached dependent tables, the asynchronous partition detach task makes the data partition into a stand-alone target table. Until the asynchronous partition detach operation completes, the target table is unavailable.
Similar to adding a new partition, you simply create a table with the same definition as the partitioned table, load it with data, and attach that partition to the main partitioned table, as follows:
Listing 8. Attach partition to main partitioned table
ALTER TABLE FACT_TABLE ATTACH PARTITION STARTING '06-01-2006' ENDING '06-30-2006' FROM TABLE FACT_NEW_MONTH
You need to use
SET INTEGRITY to validate
data and maintain indices after the
operation. Starting with DB2 10.1, if data integrity can be checked before
attachment, newly attached data can be made available sooner with
IMMEDIATE UNCHECKED. Otherwise, if
INTEGRITY IMMEDIATE CHECKED is used,
commit is required before data is visible.
There are two types of indexing on partitioned tables: partitioned and non-partitioned indices. In general, indices on partitioned tables behave the same as those on non-partitioned tables, but they follow a different storage model:
- While all indices of a non-partitioned table reside in one shared index object within the same table space, non-partitioned indices of a partitioned table are created each in its own index object in a single table space, even if the table partition spans several table spaces.
- A partitioned index uses an index organization scheme in which index data is
divided across multiple index partitions, according to the partitioning
scheme of the table. Each index partition refers only to table rows in the
corresponding data partition. All index partitions for a specific data
partition reside in the same index object. By default, index partitions are
placed in the same tablespace as the data partitions that they reference.
To override this default behavior, you must use the
INDEX INclause for each data partition you define by using the
CREATE TABLEstatement. In other words, if you plan to use partitioned indices for a partitioned table, you must anticipate where you want the index partitions to be stored when you create the table. If you try to use
INDEX INwhen creating a partitioned index, you receive an error message.
The figure below illustrates the placement of indices on non-partitioned table
table1. This is done using the
clause of the
CREATE TABLE statement.
Listing 9. Use INDEX IN clause of CREATE TABLE statement
CREATE TABLE table1 (a int, b int) IN tbs1 INDEX IN tbs2 CREATE INDEX i1 ON table1 (a) CREATE INDEX i2 ON table1 (b)
Figure 5. Indices on a non-partitioned table
The figure below illustrates the placement of partitioned and non-partitioned tables on a partitioned table table1 whose definition is as follows:
create table table1 (a int, b int) partition by range (a) (starting from (1)ending at (100), partition by range (a) (starting from (101)ending at (200), partition by range (a) (starting from (201)ending at (300) index in tbs6)
Figure 6. Indices on a partitioned table
We have the following indices defined:
- Non-partitioned index i1 resides in table space tbs4:
CREATE INDEX i1 ON table1 (b) NOT PARTITIONED IN tbs4.
- Non-partitioned index i2 resides in table space tbs5:
CREATE INDEX i2 ON table1 (a,b) NOT PARTITIONED IN tbs5.
- Partitioned index i3 resides in several table spaces as follows:
CREATE INDEX i2 ON table1 (a) PARTITIONED. Indices on partitions 1 and 2 reside in the same table spaces as the data partitions they refer to following the default behavior. Data partition 3 overrides the default behavior using the
INDEX INclause to specify tbs6 as the tablespace for its partitioned indices.
Note that if the table is partitioned and neither PARTITIONED nor NOT PARTITIONED is specified, the index is created as partitioned.
Note that the
IN table space name clause in
CREATE INDEX statement can be
specified only for a non-partitioned index on a partitioned table. The
specification of a table space specifically for the index overrides a
specification made using the
INDEX IN clause when the table was created.
Range clustered tables
Range clustered tables (RCTs) are tables where data are clustered according to a specific key value. Every key value is associated with its corresponding row location. Hence, this approach provides exceptionally fast access to specific table rows. Key values must have the following characteristics:
- Not null
- An integer (SMALLINT, INTEGER, or BIGINT)
- Monotonically increasing
- Within a predetermined set of ranges based on each column in the key
Note that you cannot create a regular index on the same key values used to define the range-clustered table.
DB2 pre-allocates the disk space required for the RCT at creation time. This is
done by calculating the number of distinct key values and multiplying it by the
table row size. This mandates that the space required for holding the entire
table should be available at creation time. Additionally, you cannot issue an
ALTER TABLE statement to alter the physical characteristics of an RCT after its
RCTs are created using the
ORGANIZE BY KEY
SEQUENCE clause of the
CREATE TABLE statement as follows:
Listing 10. Creating RCTs
create table table1 (id int not null, emp_id int not null, emp_name varchar(40) ) organize by key sequence (id starting from 100 ending 1000, emp_id ending 100) disallow overflow
This example creates an RCT with the following key characteristics:
- ID values range from 100 to 1,000.
- emp_id values range from 1 to 100 (if the
STARTING FROMclause is not stated, the default starting value is 1).
DISALLOW OVERFLOWclause indicates that rows with key values outside the specified ranges will not be accepted.
A benefit of overflow-disallowed RCT tables is that table reorganization operations are not required since such tables are always clustered.
In cases you need to allow rows with key values outside the specified range to be
stored in the table, you should use the
OVERFLOWS clause while defining
the table. In this case, rows with overflow key values are placed in an overflow
area, which is dynamically allocated. As more records are added to this overflow
area, operations against the table that involve the overflow area require more
processing time. The order of rows in the overflow area is not guaranteed. The
larger the overflow area, the more time is required to access it. If this
becomes a problem, consider reducing the size of the overflow area by exporting
the data to a new RCT with wider ranges.
Listing 11. Example of an RCT allowing overflows
create table table1 (id int not null, emp_id int not null, emp_name varchar(40) ) organize by key sequence (id starting from 100 ending 1000, emp_id ending 100) allow overflow
Multi-dimensional clustering (MDC) enables a table to be physically clustered on more than one key, or dimension, simultaneously. Prior to Version 8, DB2 supported only single-dimensional clustering of data using clustering indices. When a clustering index is defined on a table, DB2 attempts to maintain the physical order of the data on pages, based on the key order of the clustering index, as records are inserted into and updated in the table. This can significantly improve the performance of queries that have predicates containing the keys of the clustering index because, with good clustering, only a portion of the physical table needs to be accessed. When the pages are stored sequentially on disk, more efficient prefetching can be performed.
With MDC, these same benefits are extended to more than one dimension, or clustering key. In the case of query performance, range queries involving any one or combination of the specified dimensions of the table will benefit from the underlying clustering. These queries will need to access only those pages having records with the specified dimension values, and the qualifying pages will be grouped together in extents.
A table with a clustering index can become unclustered over time, as available space is filled in the table. However, an MDC table is able to maintain its clustering over the specified dimensions automatically and continuously, eliminating the need to reorganize the table to restore the physical order of the data.
When an MDC table is created, the dimensional keys along which to cluster the table's data are specified. Each specified dimension can be defined with one or more columns, the same as an index key. A dimension block index will be automatically created for each of the dimensions specified and will be used to access data quickly and efficiently along each of the specified dimensions. A block index will also be automatically created, containing all dimension key columns. The block index will be used to maintain the clustering of the data during insert and update activity, as well as for quick and efficient access to the data.
Every unique combination of the table's dimension values forms a logical cell, which is physically composed of blocks of pages, where a block is a set of consecutive pages on disk. The set of blocks that contain pages with data having the same key value of one of the dimension block indices is called a slice. Every page of the table will be stored in only one block, and all blocks of the table will consist of the same number of pages, known as the blocking factor. The blocking factor is equal to the table space's extent size, so the block boundaries line up with extent boundaries.
Creating an MDC table
To create an MDC table, you need to specify the dimensions of the table using the
organize by parameter, as follows:
Listing 12. Syntax for creating a MDC table
CREATE TABLE MDCTABLE( Year INT, Nation CHAR(25), Colour VARCHAR(10), ... ) ORGANIZE BY(Year, Nation, Color)
In this example, the table will be organized on the year, nation, and color dimensions, and will logically look like the figure below.
Figure 7. Data organization within an MDC
You cannot alter a table and make it into an MDC table, so use the design adviser, if possible, before you create the database to see if your tables should be MDC tables or normal tables.
The following list summarizes the design considerations for MDC tables:
- When identifying candidate dimensions, search for attributes that are not too granular, thereby enabling more rows to be stored in each cell. This approach will make better use of block-level indices.
- Higher data volumes may improve population density.
- It might be useful to load the data first as non-MDC tables for analysis only.
- The table space extent size is a critical parameter for efficient space usage.
- Although an MDC table may require a greater initial understanding of the data, the payback is that query times will likely improve.
- Some data may be unsuitable for MDC tables and would be better implemented using a standard clustering index.
- Although a smaller extent size will provide the most efficient use of disk space, the I/O for the queries should also be considered.
- A larger extent will normally reduce I/O cost, because more data will be read at a time. This, in turn, makes for smaller dimension block indices because each dimension value will need fewer blocks. And, inserts will be quicker because new blocks will be needed less often.
Insert time clustering
Insert time clustering (ITC) is a new partitioning feature coming with DB2 10.1. ITC tables cluster data based on their insertion time so, data inserted within the same time interval are physically grouped together. ITC tables eases the management of space utilization in cases when your system runs batch data deletion and you want to reclaim the trapped free space after the batch deletion is done.
ITC tables have similar characteristics to MDC tables. For example, both table types use block based allocation and block indices. ITC and MDC tables differ in how data is clustered. ITC tables cluster data by using a implicitly created virtual column to cluster rows inserted at a similar time together. ITC virtual dimension cannot be manipulated while clustering dimensions on MDC tables are specified by the creator.
ITC tables are created with the
CREATE TABLE command by specifying the
BY INSERT TIME clause:
CREATE TABLE ...
ORGANIZE BY INSERT TIME. You can convert existing tables to ITC tables using either of two ways:
- Using the
ADMIN_MOVE_TABLEprocedure to convert the table online.
- Using the export/import or a load from table utilities.
Note that existing tables cannot be altered to become ITC tables.
Materialized query tables
A materialized query table (MQT) is a table whose definition is based upon the result of a query. The data contained in an MQT is derived from one or more tables accessed by the query on which the MQT definition is based. You can think of an MQT as a kind of materialized view. Views and MQTs are defined on the basis of a query. However, an MQT actually stores the query results as data, and you can work with the data that is in the MQT instead of the data in the underlying tables. Thus, MQTs can significantly improve the performance of queries, especially complex queries.
An MQT can be defined at table creation time as maintained by the system or maintained by the user. When you create this type of MQT, you can specify whether the table data is a REFRESH IMMEDIATE or REFRESH DEFERRED. The REFRESH keyword lets you specify how the data is to be maintained. DEFERRED means that the data in the table can be refreshed at any time using the REFRESH TABLE statement. Neither REFRESH DEFERRED nor REFRESH IMMEDIATE system-maintained MQTs allow insert, update, or delete operations to be executed against them. However, REFRESH IMMEDIATE system-maintained MQTs are updated with changes made to the underlying tables as a result of insert, update, or delete operations.
The following example creates a system maintained MQT with the
REFRESH IMMEDIATE clause. The
DEFERRED clause means that data is not inserted into
the table as part of the
CREATE TABLE statement.
Listing 13. Syntax for creating a system maintained MQT
CREATE TABLE EMP_DEP AS (SELECT E.EMPNO, E.FIRSTNME, E.LASTNAME, E.PHONENO, D.DEPTNO, SUBSTR(D.DEPTNAME, 1, 12) AS DEPARTMENT, D.MGRNO FROM EMPLOYEE E, DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO) DATA INITIALLY DEFERRED REFRESH IMMEDIATE
In the above example:
- An MQT called EMP_DEP is created joining the data from the EMPLOYEE and DEPARTMENT table.
DATA INITIALLY DEFERREDclause means that EMP_DEP MQT is not populated with data as part of the
REFRESH IMMEDIATEclause means that refreshing the data in the MQT is maintained by the system once an update is made to the underlying EMPLOYEE or DEPARTMENT tables.
After being created, the MQT is in check-pending state and cannot be queried
until the following
SET INTEGRITY statement
SET INTEGRITY FOR EMP_DEP IMMEDIATE
CHECKED NOT INCREMENTAL.
IMMEDIATE CHECKED clause in the above statement specifies that the data is to
be checked against the EMP_DEP MQT's defining query and refreshed. The
INCREMENTAL clause specifies that integrity checking is to be done on the whole
MQT and, hence, the MQT definition is recomputed.
Introduction to XML in DB2
You've been able to store XML data in DB2 for quite some time. Of course, you can store the object as a CLOB and, with the XML extender; you can shred the document into relational tables that let you efficiently access subcomponents of an XML document with a query. But each of these methods has a disadvantage. Shredded documents can result in loss of document fidelity and make it difficult to change the XML schema. The biggest benefit of XML is that the schema is completely flexible, making it rigid by shredding it into relations is counterproductive. With CLOB you can keep the flexibility but every time you want to read components of the XML you need to parse the CLOB at runtime, so performance is poor.
DB2 9 introduces a completely new XML storage engine where XML data is stored hierarchically. XML is hierarchical in nature, so storing the XML hierarchically in the engine preserves fidelity, allows for flexible schema, and also delivers high-performance sub-document access. This new hierarchical storage engine sits inside the same DB2 data server as the relational engine, so now you can store customer information alongside their XML purchase orders and search all of the information efficiently.
DB2 10.1 continues adding more cool features to pureXML. New XML indices now more closely match your data, functional indices can speed up searches and queries, binary XML format enables faster data transmission, handling of casting errors is now aligned with SQL, performance improved for certain XML queries, XML type is now allowed in global variables and also in compiled SQL functions.
XML columns in DB2 tables
XML is stored inside DB2 in a hierarchical format. XML itself is hierarchical starting from the root tag (or node) and then traversing through the XML string or document. In DB2 the XML is stored inside of data pages in this hierarchical structure. If the XML data is larger than a single data page, the XML tree is broken up into subtrees with each subtree stored on a data page and the pages linked together.
To create a table with XML data simply run the command
create table table_name (col1 data_type, ..., xml_col_name XML).
This allows you to create the table with your relational columns as you would today, and for your XML information, you just assign the column a data type of XML. Now you can store the XML data in that column.
Creating an index is similar to creating a typical index on relational data with the exception that you are not indexing a column but rather a component of the XML schema defined in the above xml_column_name column. The syntax would look something like the code below:
create index index_name on table_name (xml_column_name) generate key using xmlpattern '/po/purchaser/@pname' as sql varchar(50)
Starting with DB2 10.1, you can create functional XML indices using the
functions. Indices created using
speed up case-insensitive searches of XML data. Indices created using
fn:exists can speed up queries that search for specific elements or for the lack of
specific elements. For instance, in our previous example, if you need to speed
up case-insensitive searches over the
attribute, you'd use
fn:upper-case in your index as
create index index_name on table_name (xml_column_name) generate key using xmlpattern '/po/purchaser/@pname/fn:upper-case(.)' as sql varchar(50)
New XML index data types
You can now create indices of type DECIMAL and INTEGER over XML data. In situations where your numeric data is of either INTEGER or DECIMAL type, indices created as DECIMAL and INTEGER values can potentially provide faster query response times. In previous releases, DOUBLE was the only supported numeric type for XML indexes. Consider the following example using an index of type integer.
CREATE INDEX intidx on favorite_cds(cdinfo) GENERATE KEYS USING XMLPATTERN '/favoritecds/cd/year' AS SQL INTEGER
The following SQL data types are supported: VARCHAR, DATE, TIMESTAMP, INTEGER, DECIMAL, and DOUBLE.
New binary XML format
The new binary XML format provides a faster way to transmit and receive XML data between certain Java™ pureXML applications and a DB2 server Version 10.1. For these Java applications, unnecessary XML parsing costs are eliminated, therefore improving performance.
Binary XML data refers to data that is in the Extensible Dynamic Binary XML DB2
Binary XML Format (XDBX). Only Version 4.9 or later releases
of the IBM Data Server Driver for JDBC and SQLJ can send or retrieve XML data
to/from the data server in XDBX format. You have the full control over
the XML format you use for transmission using the
property either on the data source level or the connection properties.
Binary XML format is most efficient for cases in which the input or output data is in a non-textual representation, such as SAX, StAX, or DOM. For example, these methods retrieve XML data in non-textual representations.
You can use less disk space for your tables by taking advantage of the DB2 table compression capabilities. Compression saves disk storage space by using fewer database pages to store data. You can use compression with new and existing tables. Temporary tables are also compressed automatically, if the database manager deems it to be advantageous to do so. Row compression uses a dictionary-based compression algorithm to replace recurring strings with shorter symbols within data rows.
Table row compression types
There are two types of row compression you can choose from:
- Classic row compression
- Adaptive compression
Data stored within data rows, including inlined LOB or XML values, can be compressed with adaptive and classic row compression. XML storage objects can be compressed using static compression. However, storage objects for long data objects stored outside table rows is not compressed.
Classic row compression
Classic row compression, sometimes referred to as static compression, uses a table-level compression dictionary to compress data by row. The dictionary is used to map repeated byte patterns from table rows to much smaller symbols; these symbols then replace the longer byte patterns in the table rows. The compression dictionary is stored with the table data rows in the data object portions of the table.
Using row compression
To use row compression, you must first set the table to be compression eligible,
then you must generate the dictionary that contains the common strings from
within the table. To set the table to be eligible for compression, use either of
the following commands:
create table table_name
... compress yes static or
alter table table name compress yes
Creating the row compression dictionary
Creating the compression dictionary allows the table to be compressed. DB2 then
needs to scan the data in the table to find the common strings it can
compress out of the table and put in the dictionary. To do this, you use the
reorg command. The first time you compress a table (or to rebuild
the compression dictionary) you must run the command
reorg table table_name resetdictionary.
This will scan the table, create the dictionary, and perform the actual
table reorganization, compressing the data as it goes. From this point onward,
any insert into this table or subsequent load of data will honor the compression
dictionary and compress all new data. If in the future you want to run a normal
table reorg and not rebuild the dictionary, you can run
reorg table table_name keepdictionary.
Each table has its own dictionary, meaning that a partitioned table will have a separate dictionary for each partition. This is good because it allows DB2 to adapt to changes in the data as you roll in a new partition.
To disable compression for a table, use the
TABLE statement with the
COMPRESS NO option; rows you later add are not compressed. Existing rows
remain compressed. To extract the entire table after you turn off compression,
you must perform table reorganization with the
REORG TABLE command:
alter table tablename compress no reorg table table_name
Adaptive compression actually uses two compression approaches. The first employs the same table-level compression dictionary used in classic row compression to compress data based on repetition within a sampling of data from the table as a whole. The second approach uses a page-level dictionary-based compression algorithm to compress data based on data repetition within each page of data. The dictionaries map repeated byte patterns to much smaller symbols; these symbols then replace the longer byte patterns in the table. The table-level compression dictionary is stored within the table object for which it is created and is used to compress data throughout the table. The page-level compression dictionary is stored with the data in the data page and is used to compression only the data within that page.
Using adaptive compression
You compress table data using adaptive compression by setting the
COMPRESS attribute of the table to
ADAPTIVE or to
YES, which by default
enables adaptive compression. You can set this attribute when you create the
table by running the command
tablename .. compress yes adaptive or
create table tablename ..
You can also alter an existing table to use compression by using the same options
ALTER TABLE statement
alter table tablename compress yes adaptive or
alter table tablename compress yes.
HINT: If you have scripts or applications that issue
ALTER TABLE or
statements with the
COMPRESS YES clause, make sure
you add the
ADAPTIVE keyword to explicitly indicate the table compression method
After you enable compression, operations that add data to the table, such as an
INSERT, LOAD INSERT, or IMPORT INSERT command
operation, can use adaptive compression. In addition, index
compression is enabled for the table. Indices are created as
compressed indices unless you specify otherwise and if they are the types that can be compressed.
To disable compression for a table, use the
TABLE statement with the
option; rows that you later add are not compressed. Existing rows remain
compressed. To extract the entire table after you turn off compression, you must
perform a table reorganization with the
Compress temp tables
Compression for temporary tables is enabled automatically with the DB2 Storage Optimization Feature. Only classic row compression is used for temporary tables. When executing queries, the DB2 optimizer considers the storage savings and the impact on query performance that compression of temporary tables offers to determine whether it is worthwhile to use compression. If it is worthwhile, compression is used automatically.
Estimating space savings
ADMIN_GET_TAB_COMPRESS_INFO table function
estimates the compression savings that can be gained for the table, assuming a
RESETDICTIONARY option will be performed. The following example uses
ADMIN_GET_TAB_COMPRESS_INFO table function to estimate percentage
saved for classic and adaptive compression if enabled on the
SELECT TABNAME, OBJECT_TYPE, ROWCOMPMODE, PCTPAGESSAVED_CURRENT current, PCTPAGESSAVED_STATIC with_static, PCTPAGESSAVED_ADAPTIVE with_adaptive FROM TABLE(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO('DB2INST1','CUSTOMERS')) AS T
Figure 8. Result for the example using ADMIN_GET_TAB_COMPRESS_INFO table function
Introduction to temporal tables
A new feature included in DB2 10.1 is the time-travel or temporal data tables. Temporal tables allow you track the changes in your data so that you can back in time and query the data at a specific point in time. You can also control the business validity of your data. Temporal data can be used mainly in two flavors: system-period and application-period. You can combine the two to create a third: the so-called bi-temporal tables. Using new and standardized SQL syntax, you can easily insert, update, delete, and query data in the past, present, or future.
The temporal data management capabilities in DB2 are based on three types of temporal tables. The type of temporal table you are going to use is controlled by the nature of the time fields it's going to manage. Is it a system time? Business time? Or both? System time is maintained by DB2 system and is provided by the system clock on DB2 server when a transaction occurs. Business time is date or time stamp provided by the application to control the business validity of records. It's completely unrelated to the system time on DB2 server.
System-period temporal tables (STT)
System-period temporal tables (STT) are based on the system time. DB2 keeps a history of rows that have been updated or deleted over time according to system time. Each row and its corresponding history are automatically assigned a pair of system time (transaction time).
Application-period temporal tables
Unlike STT, application-period temporal tables (ATT) require that the application is aware of the ATT to provide a valid business time. Applications control the business validity of their data by supplying a date or timestamp. The provided dates or timestamps describe when the data in a given row was or will be valid in the real world. Again, new SQL constructs allow users to insert, query, update, and delete data in the past, present, or future. DB2 automatically applies temporal constraints and row-splits to correctly maintain the application-supplied business time.
Bi-temporal tables manage system and business time, and combine all the capabilities of both. This enables applications to manage the business validity while DB2 keeps a full history of data changes. Every bi-temporal table is both an STT and an ATT.
One key difference between STT and ATT is that an STT has an associated separate history table to maintain previous versions of the data. An ATT has no separate history table; all rows are maintained within the same table. When rows from ATT are affected by delete operations, they are permanently deleted from the table.
Table 4 gives a quick overview of the characteristics of system time and business time. If your requirements match the characteristics of system time, you can use STT. If they match the characteristics of business time, go for ATT. If your application needs both characteristics, you can leverage it to the bi-temporal table.
Table 4. Characteristics of system time and business time
|System time||Business time|
|Captures the time when changes happen to data inside a DB2 database||Captures the time when changes happen to business objects in the real world|
|Maintains a history of updated and deleted rows, generated by DB2||Maintains application-driven changes to the time dimension of business objects|
|History based on DB2 system timestamps||Dates or timestamps are provided by the application|
|DB2's physical view of time||Your application's logical view of time|
|Spans from the past to the present time||Spans past, present, and future time|
|System validity (transaction time)||Business validity (valid time)|
|Supports queries such as "Which policies were stored in the database on June 30?"||Supports queries such as "Which policies were valid on June 30?"|
Consider the following guidelines when you choose temporal tables:
- Use system time and system-period temporal tables to track and maintain a history of when information was physically inserted, updated, or deleted inside your DB2 database.
- Use business time and application-period temporal tables, if you need to describe when information is valid in the real world, outside of DB2.
- Use bi-temporal tables, if you need to track both dimensions of time. With a bi-temporal table you can manage business time with full traceability of data changes and corrections.
System-period temporal tables
Creating a system-period temporal table
To create a system-period temporal table that leverages system time and versioning, you need to create a base table with three specific columns, create an identical history table for historical data, and finally associate the history table with the base table. Let's look at these three steps in more details:
- Create a base table with three specific columns
The three specific columns are row-begin column, row-end column, and transaction start-ID column. The names of three columns can be any valid DB2 column name (e.g., start_time, end_time, and trans_id). Each column must be defined as a TIMESTAMP. The row-begin and row-end column names must be specified in the PERIOD SYSTEM_TIME (start_time, end_time) clause.
Listing 14. Sample DDL table with three specific columns
CREATE TABLE travel( trip_name CHAR(30) NOT NULL PRIMARY KEY, destination CHAR(12) NOT NULL, departure_date DATE NOT NULL, price DECIMAL (8,2) NOT NULL, sys_start TIMESTAMP(12) NOT NULL generated always as row begin implicitly hidden, sys_end TIMESTAMP(12) NOT NULL generated always as row end implicitly hidden, tx_start TIMESTAMP(12) generated always as transaction start id implicitly hidden, PERIOD SYSTEM_TIME (sys_start, sys_end) );
- Create an identical history table for historical data
The history table is logically identical to the base table. However, it can be physically different from the base table. You can put it in a different table space, or even you can have different partitioning schemes for each table. You may use the like clause in the create table statement to create a history table with the same names and descriptions as the columns of the base table as follows:
CREATE TABLE travel_history LIKE travel IN hist_space WITH RESTRICT ON DROP;.
- Associate the history table with the base table
The last step in setting up your system-period temporal table is to link the history table with the base table. This can be easily done with the
ALTER TABLE travel ADD VERSIONING USE HISTORY TABLE travel_history;.
You can hide the special columns by specifying the
HIDDEN clause in the column definition. In that case, if you don't
use the like clause when creating the history table, you must mark all hidden
columns in the base as hidden in the history table as well.
You can also change your existing table to leverage the temporal table using the
same steps. In that case, the base table already exists without the three
columns. Alter the table adding these columns and adding the
PERIOD SYSTEM_TIME(sys_start, sys_end).
DML operations against System-period temporal table
Application-period temporal tables
ATT allows you to store business time. By "business time," we mean the application logical notion of time. That is, it reflects when information was, is, or will be effective (valid) in the real world. Business time by itself does not involve a history table, and the effective dates must be provided by the application. We are talking about the validity of the data not the history of the data.
Let's take an example to make it clear by an example. Consider the following two very different questions:
- What was the effective price for Manu Wilderness trip on 06/15/2012? (Business time)
- What was the stored price for Manu Wilderness trip on 06/15/2012? (System time)
For example, if a trip will be offered at a discounted price during June, the price change might get entered into the database today or at anytime (system time) with an effective date from 1 Jun 2012 to 1 Jul 2012 (business time). Similarly, a new insurance policy might get created and inserted into the database in May, but backdated with an effective start date of 15 Apr. In short, the business validity for a record is independent of when that record was or was not physically present in the database.
Creating application-period temporal tables
Similar to STT, you will need a pair of DATE or TIMESTAMP columns that describe
the start and end points of the business validity of each record. Although these
columns can have arbitrary names, they must be declared using the
period clause in the
CREATE TABLE statement. This period declaration enables DB2
to enforce temporal constraints and to perform temporal
DELETE operations for business
time. DB2 will transparently add, split, or delete rows as needed.
Let's create a sample, run the following DDL.
Listing 15. Sample DDL to create application-period temporal table
CREATE TABLE travel_att ( trip_name CHAR(25) NOT NULL, destination CHAR(8) NOT NULL, price DECIMAL(8,2) NOT NULL, bus_start DATE NOT NULL, bus_end DATE NOT NULL, PERIOD BUSINESS_TIME (bus_start, bus_end), PRIMARY KEY (trip_name, BUSINESS_TIME WITHOUT OVERLAPS) );
The optional clause
OVERLAPS in the primary key constraint is new in DB2 10.1 and enforces temporal uniqueness. Temporal
uniqueness means, rows with duplicate trip_name values are allowed if the
business time periods of these rows do not overlap. In other words,
BUSINESS_TIME WITHOUT OVERLAPS means that
for any given point in time, there is at most one row that is effective (valid) for a given trip_name.
Insert data into application-period temporal tables
Inserting new record into an ATT is no different from the regular SQL insert statement. You simply need to supply appropriate values for all required columns, including the columns representing business time start and end values. For example, to insert a two rows into our sample travel table with business time, we could issue the following statement:
INSERT INTO travel_att VALUES ('Manu Wilderness', 'Peru', 1500.00,'05/01/2012', '01/01/2013'), ('The Great Wall Tour', 'China', 2200.00, '07/01/2012', '12/30/9999');
At this point, the data in the travel_att table should look like below:
TRIP_NAME DESTINATION PRICE BUS_START BUS_END -------------------- ----------- -------- ----------- ----------- Manu Wilderness Peru 1500.00 2012-05-01 2015-10-01 The Great Wall Tour China 2200.00 2012-07-01 9999-12-30
As you can see, the TRAVEL_ATT table doesn't contain any information about when the rows were inserted, updated, or deleted. It only tracks validity (business time) of the trips.
Update data in an application-period temporal table
The company discovered that the demand on Manu Wilderness trip is very low in July. They made an offer for July reducing its price to $1,000 instead of $1,500. This offer is only valid for July. The price remains the same for the remaining period until the trip becomes invalid:
UPDATE travel_att FOR PORTION OF BUSINESS_TIME FROM '06/01/2012' TO '07/01/2012' SET price = 1000.00 WHERE trip_name = 'Manu Wilderness';
The command completed successfully and the period from 5 May 2012 to 1 Oct 2015 is split into three periods. DB2 inserted two new rows and updated the existing row as shown below:
Listing 16. DB2 inserts two new rows and updates existing row
TRIP_NAME DESTINATION PRICE BUS_START BUS_END -------------------- ----------- ------- ---------- ----------- Manu Wilderness Peru 1000.00 2012-06-01 2012-07-01 The Great Wall Tour China 2200.00 2012-07-01 9999-12-30 Manu Wilderness Peru 1500.00 2012-05-01 2012-06-01 Manu Wilderness Peru 1500.00 2012-07-01 2015-10-01
Now if we try to insert a record with a period overlapping with the existing period, DB2 will restrict you and the command will fail. For example, if you try to insert a new record with trip name Manu Wilderness and an overlapping period, DB2 will fail with SQL error -0803. In that case, you may change the name or update the existing record based on your business needs.
Delete data from an application-period temporal table
Deleting records from an application-period temporal table removes rows from the
table and can potentially result in new rows inserted into the
application-period temporal table itself. A row is a candidate for deletion if
its period-begin column, period-end column, or both fall within the range
specified in the
FOR PORTION OF BUSINESS_TIME clause. If a specified period
falls in the middle of an existing period, DB2 will transparently split the
exiting period into three periods and delete the correct one. Let's clarify it
more with the following example:
DELETE FROM travel_att FOR PORTION OF BUSINESS_TIME FROM '09/15/2012' TO '09/20/2012' WHERE trip_name LIKE 'Manu Wilderness';
When you delete for a portion of business time, you are not necessarily removing records from the database. You are marking this period as invalid for the application. You are making it logically unavailable from the business perspective. However, this may not be implemented by a physical delete. In our example, the Manu Wilderness is available in the period from 1 Jul 2012 to 1 Oct 2015. Marking this trip is not available (delete for portion of business time) in the period from 15 Sep 2012 to 20 Sep 2012 means that there will be a gap. To keep the records before and after this gap, the current record will be split into two rows. In other words, the existing record will be updated and a new record will be inserted:
Listing 17. Existing record updated and new record inserted
TRIP_NAME DESTINATION PRICE BUS_START BUS_END ------------------- ----------- ---------- ----------- ----------- Manu Wilderness Peru 1000.00 2012-06-01 2012-07-01 The Great Wall Tour China 2200.00 2012-07-01 9999-12-30 Manu Wilderness Peru 1500.00 2012-05-01 2012-06-01 Manu Wilderness Peru 1500.00 2012-07-01 2012-09-15 Manu Wilderness Peru 1500.00 2012-09-20 2015-10-01
DB2 manages all system time and business time periods as inclusive-exclusive periods, or closed-open periods. Therefore, it is recommended to also use inclusive-exclusive periods at the application level and to avoid mapping between inclusive-inclusive and inclusive-exclusive periods.
Bi-temporal tables combine all the capabilities and restrictions that apply on system-period and application-period temporal tables. You need two columns for the systems time period and another two for the business time period, as well as the transaction start-ID column.
Creating bi-temporal tables
Administrators can easily create or alter a table to include system and
business time. For example, the following
TABLE statement defines a
bi-temporal table with a BUSINESS_TIME period on the BUS_START and BUS_END
columns, as well as a SYSTEM_TIME period on the SYS_START and SYS_END columns.
Listing 18. Sample DDL to create bi-temporal tables
CREATE TABLE policy ( id INT NOT NULL, vin VARCHAR(10), annual_mileage INT, rental_car CHAR(1), coverage_amt INT, bus_start DATE NOT NULL, bus_end DATE NOT NULL, sys_start TIMESTAMP(12) GENERATED ALWAYS AS ROW BEGIN NOT NULL, sys_end TIMESTAMP(12) GENERATED ALWAYS AS ROW END NOT NULL, trans_start TIMESTAMP(12) GENERATED ALWAYS AS TRANSACTION START ID IMPLICITLY HIDDEN, PERIOD SYSTEM_TIME (sys_start, sys_end), PERIOD BUSINESS_TIME(bus_start, bus_end), PRIMARY KEY(id, BUSINESS_TIME WITHOUT OVERLAPS) );
Querying temporal tables
The new SQL constructs allow you to query the system-period temporal table or views on an STT to return results for a specified period or point in time. Querying temporal tables can be summarized in five basic examples:
- Query without time specification — This leverages the temporal tables special registers if they are set to a not null value. Otherwise, it retrieves the current data.
- Query with
AS OFvalue clause — Includes all the rows where the begin value for the period is less than or equal to value1 and the end value for the period is greater than value1. This enables you to query your data as of a certain point in time.
- Query with
FROM value1 TO value2clause — Includes all the rows where the begin value for the period is equal to or greater than value1 and the end value for the period is less than value2. This means that the begin time is included in the period, but the end time is not.
- Query with
BETWEEN value1 AND value2clause — Includes all the rows where any time period overlaps any point in time between value1 and value2. A row is returned if the begin value for the period is less than or equal to value2 and the end value for the period is greater than value1.
- Query views defined on a temporal table — You can either set the time specification at the query or in the view DDL. However, you can't have time specified on both.
Partitioning temporal tables
Any temporal table can also be a range-partitioned table. A history table that belongs to a system period temporal table or a bi-temporal table can also be range-partitioned. The partitioning scheme of a history table may differ from the partitioning of its base table. For example, you can partition the current data by month and the history data by year. However, ensure that the ranges you defined for the history table partitions can absorb any rows moved from the base table to the history table. Consider the following example.
Listing 19. Example for table partitioning
CREATE TABLE policy ( id INTEGER PRIMARY KEY NOT NULL, annual_mileage INTEGER, rental_car CHAR(1), coverage_amt INTEGER, sys_start TIMESTAMP(12) GENERATED AS ROW BEGIN NOT NULL, sys_end TIMESTAMP(12) GENERATED AS ROW END NOT NULL, trans_start TIMESTAMP(12) GENERATED AS TRANSACTION START ID, PERIOD SYSTEM_TIME (sys_start, sys_end) ) PARTITION BY RANGE(sys_start) (STARTING('2012-01-01') ENDING ('2014-12-31') EVERY 1 MONTH ); CREATE TABLE policy_history ( id INTEGER NOT NULL, annual_mileage INTEGER, rental_car CHAR(1), coverage_amt INTEGER, sys_start TIMESTAMP(12) NOT NULL, sys_end TIMESTAMP(12) NOT NULL, trans_start TIMESTAMP(12)) PARTITION BY RANGE(sys_start) (STARTING('2012-01-01') ENDING ('2013-12-31') EVERY 3 MONTHS ); ALTER TABLE policy ADD VERSIONING USE HISTORY TABLE policy_history;
The example shows different partitioning schemes for the base and history table. The policy table is partitioned by month; the policy_history table is partitioned by quarter. This difference in granularity is not a problem. However, the policy table has partitions for sys_start values up to 2014-12-31 while the policy_history table has partitions only up to 2013-12-31. If a row in the policy table with a sys_start value of 2014-01-01 or greater is deleted, the insert into the history table fails because it has no partition to accept this sys_start value. Hence, the entire delete transaction fails. To avoid this, define the history table such that the total range of all it's partitions is always equal to or greater than the total range of partitions in the base table.
After versioning has been enabled with the
TABLE...ADD VERSIONING statement, you can still detach partitions from the history table for pruning
and archiving purposes. However, to detach a partition from the base
table, you must first stop versioning with the
ALTER TABLE...DROP VERSIONING statement.
When you stop versioning and detach a partition from the base table, this
partition becomes an independent table. It retains all three timestamp columns
(row begin, row end, transaction start ID) but not the
PERIOD SYSTEM_TIME declaration. The rows in the detached partition are not automatically moved into
the history table. If you choose to move these rows into the history table
yourself, change the sys_end value of every row from
9999-12-30-00.00.00.000000000000 to the current timestamp. This change is
necessary to reflect the point in time when the rows changed from being current
to being history. If you do not make this change, temporal queries might return
You can attach a table to a partitioned base or history table while versioning is
enabled. The table you attach is not required to have a
PERIOD SYSTEM_TIME declaration, but it must have all three timestamp columns defined as in the base
table. While versioning is enabled, you cannot use the
SET INTEGRITY statement with the
EXCEPTION clause. The reason is that moving any exception rows into
an exception table cannot be recorded in the history table, which jeopardizes
the ability to audit of the base table and its history. However, you can
temporarily disable versioning, perform
INTEGRITY with the
FOR EXCEPTION clause, then enable versioning again.
ADMIN_MOVE_TABLE procedure and temporal tables
There are some limitations when using the
ADMIN_MOVE_TABLE stored procedure to
move data in an active system-period temporal table into a new table with the
same name. The following actions are blocked:
- Alter table operations that change the definition of the system-period temporal table or the associated history table are blocked during online move operations.
- The KEEP option of
ADMIN_MOVE_TABLEis unavailable for system-period temporal tables.
Additionally, the online-table-move operation is not supported for history tables.
Multi-temperature data feature
You can configure your databases so that frequently accessed data (hot data) is stored on fast storage, infrequently accessed data (warm data) is stored on slightly slower storage, and rarely accessed data (cold data) is stored on slow, less-expensive storage. As hot data cools down and is accessed less frequently, you can dynamically move it to the slower storage.
Multi-temperature storage provides the ability to assign priority to data (hot, warm, cool, cold) and dynamically assign it to different classes of storage. This is carried out by the introduction of storage groups, which is a new layer of abstraction between logical table spaces and physical storage (containers).
Using storage groups to prioritize your data
Storage groups allow the flexibility to implement multi-temperature data management in automatic storage table spaces. After you create storage groups that map to the different classes of storage in your database management system, you can assign automatic storage table spaces to those storage groups, based on which table spaces have hot, warm, or cold data.
The following example illustrates the use of storage groups with multi-temperature data. As shown in the figure, solid-state drives (SSD) are used to hold data for the current quarter, and enough Fibre Channel-based (FC) and Serial Attached SCSI (SAS) drives are used to hold data for the previous three quarter. The data that is older than one year is stored on a large Serial ATA (SATA) RAID array that does not perform quickly enough to withstand a heavy query workload.
Figure 9. Usage of storage groups with multi-temperature data
Changing the temperature of your data
You can dynamically reassign a table space to a different storage group as the
data changes or your business direction changes. The
TABLESPACE statement can be used to do this:
ALTER TABLESPACE tbSpc USING STOGROUP sg_target.
ALTER TABLESPACE statement is committed,
containers are allocated on the new storage group's storage paths, the existing
containers residing in the old storage groups are marked as drop pending, and an
REBALANCE operation is initiated. This
operation allocates containers on the new storage path and rebalances the data
from the existing containers into the new containers. The number and size of the
containers to create depend on both the number of storage paths in the target
storage group and on the amount of free space on the new storage paths. The old
containers are dropped, after all the data is moved.
Multi-temperature storage integrates with DB2 Workload Manager
DB2 Workload Manager (WLM) gives users the ability to prioritize incoming work based on what data is accessed (a data-centric approach). Users can assign a data tag attribute (a value from 0 to 9) to a storage group or tablespace. The data tag can be used by WLM to determine how to treat the work.
WLM can use data tags predictively. The Optimizer gathers a list of data tags for all table spaces used by an SQL statement at compile time. The data tag can influence the initial placement of the statement into a service class. WLM can also use data tags reactively. At runtime, based on the tag of a table space being accessed, a statement can be remapped into a lower-priority subclass.
A sample scenario
The following steps provide more details on how to set up multi-temperature data storage for the example shown above:
- Create three storage groups to reflect the three classes of storage, a
storage group to store hot data, a storage group to store warm data, and a
storage group to store cold data.
CREATE STOGROUP sg_hot ON '/hot/fs1' DATA TAG 1 CREATE STOGROUP sg_warm ON '/warm/fs1', '/warm/fs2' DATA TAG 5 CREATE STOGROUP sg_cold ON '/cold/fs1', '/cold/fs2', '/cold/fs3' DATA TAG 9
Data tags values assigned to storage groups sg_hot and sq_warm to indicate the business priority of the data stored in these storage groups to the WLM.
- Create four table spaces, one per quarter of data in the year, and assign
the table spaces to the storage groups. Data tags represent business
priority to be used by the optimizer.
CREATE TABLESPACE tbsp_2012q1 USING STOGROUP sg_warm CREATE TABLESPACE tbsp_2012q2 USING STOGROUP sg_warm CREATE TABLESPACE tbsp_2012q3 USING STOGROUP sg_warm DATA TAG 3 CREATE TABLESPACE tbsp_2012q4 USING STOGROUP sg_hot
This association results in table spaces inheriting the storage group properties. Table spaces tbsp_2012q1 and tbsp_2012q2 will inherit the data tag value of 5 from the sg_warm storage group, but table space tbsp_2012q3 will not inherit the data tag value from the storage group because another data tag value has been assigned to the table space. Table space for data tbsp_2012q4 will inherit the data tag value of 1 from the sh_hot storage group.
- Set up your range partitions in your table:
Listing 20. Set up range partitions
CREATE TABLE ... PARTITION BY RANGE (date_column) (PART "2012Q1" STARTING ('2012-01-01') ENDING ('2012-03-31') in "tbsp_2012q1", PART "2012Q2" STARTING ('2012-04-01') ENDING ('2012-06-30') in "tbsp_2012q2", PART "2012Q3" STARTING ('2012-07-01') ENDING ('2012-09-30') in "tbsp_2012q3", PART "2012Q4" STARTING ('2012-10-01') ENDING ('2012-12-31') in "tbsp_2012q4"
The 2012Q4 data represents the current quarter and is using the sg_hot storage group.
- After the current quarter passes, create a table space for the new quarter
and assign the table space to the sg_hot storage group:
CREATE TABLESPACE tbsp_2013q1 USING STOGROUP sg_hot.
- Move the table space for the quarter that just passed to the sg_warm
storage group. To change the storage group association for the
tbsp_2012q4 table space, issue the
ALTER TABLESPACEstatement with the
USING STOGROUPoption. To fix the business priority for the altered tablespace, the
DATA TAGoption will be used. Additionally, table space tbsp_2012q3 will be altered to change the data tag value from 3 to 5:
ALTER TABLESPACE tbsp_2012q4 USING STOGROUP sg_warm DATA TAG 3 ALTER TABLESPACE tbsp_2012q3 DATA TAG 5
- Finally, move the table space for the older quarter assigned to
sg_warm storage group to sg_cold storage group. To change
the storage group association for the tbsp_2012q1 table space, issue
ALTER TABLESPACEstatement with the
ALTER TABLESPACE tbsp_2012q1 USING STOGROUP sg_cold.
This tutorial focuses on the physical database design concepts and elements. You should now be able to do the following:
- Demonstrate the ability to create a database and manipulate various DB2
- Know how to convert existing database to automatic storage database
- Use the Admin_move_table feature
- Demonstrate knowledge of partitioning capabilities
- Demonstrate knowledge of XML data objects
- Demonstrate knowledge of compression
- Demonstrate knowledge of new table features
- Knowledge of Multi-temperature data feature
- See the other tutorials in this DB2 10.1 certification preparation series to continue your education.
- Use an RSS feed to request notification for the upcoming tutorials in this series. (Find out more about RSS feeds of developerWorks content.)
- Learn more about Information Management at the developerWorks Information Management zone. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
- Follow developerWorks on Twitter.
Get products and technologies
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
- Participate in the discussion forum.
- Check out the developerWorks blogs and get involved in the developerWorks community.