Compression dictionaries for tables enabled
for row compression can be built automatically or manually.
Automatic dictionary creation
Starting
with DB2® Version 9.5, a compression
dictionary is created automatically if each of the following conditions
is met:
- You set the COMPRESS attribute for the table to YES. You can set
this attribute to YES when you create the table, using the COMPRESS
YES option of the CREATE TABLE statement; you can also
alter an existing table to use compression using the same option on
the ALTER TABLE statement
- A compression dictionary does not already exist for
that table
- The table reaches a size such that there is sufficient data to
use for constructing a dictionary of repeated data.
Data that you subsequently move into the table is compressed
using the compression dictionary if compression remains enabled.
The
following diagram shows the process by which the compression dictionary
is automatically created:
- A compression dictionary is not created because the table is empty.
- Data is inserted into the table using insert or load operations
and remains uncompressed.
- As more data is inserted or loaded into the table, it remains
uncompressed.
- After a threshold is reached, dictionary creation is triggered
automatically if the COMPRESS attribute is set to YES.
- The dictionary is created.
- The dictionary is appended to the table.
- From this point forward,
the data is compressed.
Important: Only rows inserted into
the table subsequent to the creation of the dictionary are compressed.
The rows that existed before the dictionary was created remain uncompressed
unless they are changed, or the dictionary is manually rebuilt.
If you create a table with DB2 Version 9.7 and the table contains at least
one column of type XML, a second compression dictionary is used to
compress the XML data stored in the default XML storage object that
is associated with the table. Compression dictionary creation occurs
automatically if you set the COMPRESS attribute on the table to YES,
if a compression dictionary does not already exist within that XML
storage object, and if the there is sufficient data in the XML storage
object.
Restriction: Data
in XML columns that you created with DB2 Version
9.5 or DB2 Version 9.1 cannot
be compressed. However, XML columns that you add using DB2 Version 9.7 to a table without XML
columns that you created with an earlier release of the product can
be compressed. If a table that you created in an earlier release already
has one or more XML columns, and you want to add a compressed XML
column using DB2 Version 9.7,
you must use the ADMIN_MOVE_TABLE stored procedure to migrate the
table before you can use compression.
Compression dictionaries
for temporary tables are also created automatically, using a similar
mechanism. However, the database manager determines whether to use
row compression for temporary tables, based on factors such as query
complexity, and the size of the result set.
Manual dictionary creation
Although dictionaries are created automatically when
compression-enabled tables grow to a sufficient size, you can also
force a compression dictionary to be created if none exists, or reset
an existing compression dictionary by using the REORG TABLE command
with the RESETDICTIONARY option. This command
forces the creation of a compression dictionary if there is at least
one row of data in the table. Table reorganization is an offline operation;
one benefit of using automatic dictionary creation is that the table
remains online as the dictionary is built.
Instead of using
the REORG TABLE command to force the creation of
a new dictionary, you can use the INSPECT command
with the ROWCOMPESTIMATE option. This command
creates a new compression dictionary if the table does not already
have one. The advantage of this approach is that the table remains
online. Rows that you add subsequently are subject to compression,
however, rows that existed before you ran the INSPECT command
remain uncompressed until you perform a table reorganization.
Resetting compression dictionaries
Whether
a compression dictionary is created automatically or manually, the
dictionary is static; after it is built, it does not change. As you
add or update rows, they are compressed based on the data that exists
in the current compression dictionary. For many situations, this behavior
is appropriate. Consider, for example, a table in a database used
for maintaining customer accounts for a city water utility. Such a
table might have columns such as STREET_ADDRESS, CITY, PROVINCE, TELEPHONE_NUM,
POSTAL_CODE, and ACCOUNT_TYPE. If a compression dictionary is built
with data from a such table, even if it is only a modestly sized table,
it is likely that there would be sufficient repetitive information
for row compression to yield significant space savings. This is because
much of the data could be common from customer to customer, for example,
CITY, POSTAL_CODE, PROVINCE or portions of the STREET_ADDRSS or TELEPHONE_NUM
column.
However, other tables might change significantly over
time. Consider a range-partitioned table used for retail sales data
as follows:
- Each partition stores data for a specific month of the year,
- A partition with sales data for a given month is rolled into the
table using the ATTACH PARTITION clause of the ALTER TABLE statement
at month end.
In this case, a compression dictionary created in, say, April
might not reflect repeating data from sales in later parts of the
year. In situations where data in a table changes significantly over
time, you might want to reset your compression dictionaries using
the
REORG TABLE command with the
RESETDICTIONARY option.