DB2 Version 9.7 for Linux, UNIX, and Windows

Compression dictionary creation during load operations

LOAD INSERT and LOAD REPLACE operations that meet certain criteria trigger automatic dictionary creation (ADC). Once enough data has been processed, ADC occurs if the load is performed on a table that has the COMPRESS attribute enabled and a compression dictionary is not present.

Data row compression uses a static dictionary-based compression algorithm to compress data. Up to two separate dictionaries are used, one for compression of the table rows, and another one for compression of XML documents stored in the default XML storage object, if the table contains at least one XML column. A dictionary must first exist in the table for compression to occur. During a load operation, the default behavior (indicated by the KEEPDICTIONARY option) is to either abide by the existing dictionaries or, if dictionaries are not present, to generate them once a certain threshold of data has been scanned. Note that the dictionaries are created independently once the individual thresholds are crossed. This generally does not happen at the same time.

For non-XML data, the load utility uses the data that exists in the target table to build the dictionaries, under the assumption that this preexisting data is representative of the kind of data that will be stored in that table. In cases where there is insufficient preexisting data in the target table, the load utility builds the dictionaries once it has sampled enough input data. Prior to DB2® Version 9.7 Fix Pack 1, the load utility uses both the input data and preexisting data to build the dictionaries in this situation. In Version 9.7 Fix Pack 1 and later, the load utility uses only the input data to build the dictionary.

For XML data, the load utility samples incoming data only.

When ADC occurs on range partitioned tables, each partition is treated like an individual table. There will not be any cross-partition dictionaries and ADC does not occur on partitions already containing dictionaries. For table data, the dictionary generated for each partition is based on the preexisting table data (and, if necessary, the loaded data) in that partition only. In Version 9.7 Fix Pack 1 and later, if the preexisting data in a partition is less than the minimum threshold, the dictionary is generated based only on the loaded data. For XML data, the dictionary generated for each partition is based the data being loaded into that partition.

Any load performed in the INSERT mode implicitly follows the KEEPDICTIONARY behavior. For LOAD REPLACE operations, this is also the default, but you have an additional choice: the RESETDICTIONARY option.

LOAD REPLACE using the KEEPDICTIONARY option
A LOAD REPLACE that uses the KEEPDICTIONARY option keeps the existing dictionaries and uses them to compress the loaded data, as long as the target table has the COMPRESS attribute enabled. If dictionaries do not exist, the load utility generates new ones (provided the data that is being loaded into the table surpasses a predetermined threshold for table rows or XML documents stored in the default XML storage object) for tables with the COMPRESS attribute enabled. Since the target table's data is being replaced, the load utility uses only the input data to build the dictionaries. After a dictionary has been created, it is inserted into the table and the load operation continues.

LOAD REPLACE using the RESETDICTIONARY option
There are two key implications of using the RESETDICTIONARY option when loading into a table with the COMPRESS attribute on. First, dictionary creation occurs as long as any amount of data will exist in the target table once the LOAD REPLACE has completed. In other words, the new compression dictionaries can be based on a single row of data or a single XML document. The other implication is that the existing dictionaries are deleted but are not replaced (the target table will no longer have compression dictionaries) if any of the following situations are true:
Note: If you issue a LOAD TERMINATE operation after a LOAD REPLACE with the RESETDICTIONARY option, any existing compression dictionaries will be deleted and not replaced.
Performance impact
Dictionary creation affects the performance of a load operation in two ways: