DB2 Version 9.7 for Linux, UNIX, and Windows

Compression dictionary creation

Compression dictionaries for tables enabled for row compression can be built automatically or manually.

Automatic dictionary creation

Starting with DB2® Version 9.5, a compression dictionary is created automatically if each of the following conditions is met:

Data that you subsequently move into the table is compressed using the compression dictionary if compression remains enabled.

The following diagram shows the process by which the compression dictionary is automatically created:

Automatic dictionary creation
  1. A compression dictionary is not created because the table is empty.
  2. Data is inserted into the table using insert or load operations and remains uncompressed.
  3. As more data is inserted or loaded into the table, it remains uncompressed.
  4. After a threshold is reached, dictionary creation is triggered automatically if the COMPRESS attribute is set to YES.
  5. The dictionary is created.
  6. The dictionary is appended to the table.
  7. From this point forward, the data is compressed.
Important: Only rows inserted into the table subsequent to the creation of the dictionary are compressed. The rows that existed before the dictionary was created remain uncompressed unless they are changed, or the dictionary is manually rebuilt.
If you create a table with DB2 Version 9.7 and the table contains at least one column of type XML, a second compression dictionary is used to compress the XML data stored in the default XML storage object that is associated with the table. Compression dictionary creation occurs automatically if you set the COMPRESS attribute on the table to YES, if a compression dictionary does not already exist within that XML storage object, and if the there is sufficient data in the XML storage object.
Restriction: Data in XML columns that you created with DB2 Version 9.5 or DB2 Version 9.1 cannot be compressed. However, XML columns that you add using DB2 Version 9.7 to a table without XML columns that you created with an earlier release of the product can be compressed. If a table that you created in an earlier release already has one or more XML columns, and you want to add a compressed XML column using DB2 Version 9.7, you must use the ADMIN_MOVE_TABLE stored procedure to migrate the table before you can use compression.

Compression dictionaries for temporary tables are also created automatically, using a similar mechanism. However, the database manager determines whether to use row compression for temporary tables, based on factors such as query complexity, and the size of the result set.

Manual dictionary creation

Although dictionaries are created automatically when compression-enabled tables grow to a sufficient size, you can also force a compression dictionary to be created if none exists, or reset an existing compression dictionary by using the REORG TABLE command with the RESETDICTIONARY option. This command forces the creation of a compression dictionary if there is at least one row of data in the table. Table reorganization is an offline operation; one benefit of using automatic dictionary creation is that the table remains online as the dictionary is built.

Instead of using the REORG TABLE command to force the creation of a new dictionary, you can use the INSPECT command with the ROWCOMPESTIMATE option. This command creates a new compression dictionary if the table does not already have one. The advantage of this approach is that the table remains online. Rows that you add subsequently are subject to compression, however, rows that existed before you ran the INSPECT command remain uncompressed until you perform a table reorganization.

Resetting compression dictionaries

Whether a compression dictionary is created automatically or manually, the dictionary is static; after it is built, it does not change. As you add or update rows, they are compressed based on the data that exists in the current compression dictionary. For many situations, this behavior is appropriate. Consider, for example, a table in a database used for maintaining customer accounts for a city water utility. Such a table might have columns such as STREET_ADDRESS, CITY, PROVINCE, TELEPHONE_NUM, POSTAL_CODE, and ACCOUNT_TYPE. If a compression dictionary is built with data from a such table, even if it is only a modestly sized table, it is likely that there would be sufficient repetitive information for row compression to yield significant space savings. This is because much of the data could be common from customer to customer, for example, CITY, POSTAL_CODE, PROVINCE or portions of the STREET_ADDRSS or TELEPHONE_NUM column.

However, other tables might change significantly over time. Consider a range-partitioned table used for retail sales data as follows:
  • Each partition stores data for a specific month of the year,
  • A partition with sales data for a given month is rolled into the table using the ATTACH PARTITION clause of the ALTER TABLE statement at month end.
In this case, a compression dictionary created in, say, April might not reflect repeating data from sales in later parts of the year. In situations where data in a table changes significantly over time, you might want to reset your compression dictionaries using the REORG TABLE command with the RESETDICTIONARY option.