DB2 Version 10.1 for Linux, UNIX, and Windows

Compression dictionary creation during load operations

LOAD INSERT and LOAD REPLACE operations on tables for which compression is enabled can trigger the creation of compression dictionaries. Depending on what type of row compression a table uses, dictionary creation happens in different ways.

Classic row compression uses a single table-level compression dictionary to compress data. Adaptive compression uses multiple page-level compression dictionaries to compress individual pages of data, along with the table-level compression dictionaries used in classic row compression.

Page-level compression dictionaries

Page-level dictionaries are created and updated automatically during either LOAD INSERT or LOAD REPLACE operations; the KEEPDICTIONARY and RESETDICTIONARY options of the LOAD command have no effect on page-level dictionaries.

Table-level compression dictionaries

Table-level dictionaries are created automatically for both LOAD INSERT and LOAD REPLACE operations if no dictionary exists; however, if a table-level dictionary does exist, by default, the dictionary is not updated. More specifically, LOAD REPLACE operations assume the KEEPDICTIONARY option by default. You can specify the RESETDICTIONARY option to remove the existing table-level dictionary and create a new one.

LOAD INSERT always follows the behavior implied by the KEEPDICTIONARY option.

When building table-level dictionaries for non-XML data, the load utility uses the data that exists in the target table to build the dictionaries, under the assumption that this preexisting data is representative of the type of data that will be stored in that table. In cases where there is insufficient preexisting data in the target table, the load utility builds the dictionaries once it has sampled enough input data. In this situation, the load utility uses only the input data to build the dictionary.

For XML data, the load utility samples incoming data only.

When dictionaries are created for range-partitioned tables, each partition is treated like an individual table. There will not be any cross-partition dictionaries and dictionary creation does not occur on partitions already containing dictionaries. For table data, the dictionary generated for each partition is based on the preexisting table data (and, if necessary, the loaded data) in that partition only. In Version 9.7 Fix Pack 1 and later, if the preexisting data in a partition is less than the minimum threshold, the dictionary is generated based only on the loaded data. For XML data, the dictionary generated for each partition is based the data being loaded into that partition.

LOAD REPLACE using the KEEPDICTIONARY option

A LOAD REPLACE that uses the KEEPDICTIONARY option keeps the existing dictionaries and uses them to compress the loaded data, as long as the target table has the COMPRESS attribute enabled. If dictionaries do not exist, the load utility generates new ones (provided the data that is being loaded into the table surpasses a predetermined threshold for table rows or XML documents stored in the default XML storage object) for tables with the COMPRESS attribute enabled. Since the data in the target table is replaced, the load utility uses only the input data to build the dictionaries. After a dictionary has been created, it is inserted into the table and the load operation continues.

LOAD REPLACE using the RESETDICTIONARY option

There are two key implications of using the RESETDICTIONARY option when loading into a table with the COMPRESS attribute on. First, dictionary creation occurs as long as any amount of data will exist in the target table once the LOAD REPLACE has completed. In other words, the new compression dictionaries can be based on a single row of data or a single XML document. The other implication is that the existing dictionaries are deleted but are not replaced (the target table will no longer have compression dictionaries) if any of the following situations are true:

The operation is performed on a table with the COMPRESS attribute off
Nothing was loaded (zero rows), in which case ADM5591W is printed to the notification log

Note: If you issue a LOAD TERMINATE operation after a LOAD REPLACE with the RESETDICTIONARY option, any existing compression dictionaries will be deleted and not replaced.

Performance impact

Dictionary creation affects the performance of a load operation in two ways:

For LOAD INSERT operations, all of the preexisting table data, not just the minimum threshold for dictionary creation, is scanned before building the table-level compression dictionary. Therefore, the time used for this scan increases with table size.
There is additional processing to build the compression dictionaries, although the time used for building the dictionaries is minimal.

While some operations related to the building of dictionaries can affect the CPU utilization by the LOAD command, load operations are generally I/O bound. That is, much of the time spent waiting for the load to complete is taken up waiting for data to be written to disk. The increased load on the CPU caused by dictionary creation generally does not increase the elapsed time required to perform the load; indeed, because data is written in compressed format, I/O times can actually decrease as compared to loading data into uncompressed tables.