Data compression and performance

You can use data compression to reduce the amount of data that must be read from or written to disk, thereby reducing I/O cost.

Generally speaking, the more repetitive patterns that exist in data rows, the better your compression rates will be. If your data does not contain repetitive strings, as might be the case if you have mainly numeric data, or include inline BLOB data, compression might not yield as much in the way of storage savings. In addition, when data does not compress well, you still incur the performance impact of the database manager attempting to compress data, only to find that the storage benefits are not worth retaining any compression dictionaries. To check on the possible storage savings that you can achieve with compression, use the ADMIN_GET_TAB_COMPRESS_INFO table function.

Two forms of data compression are available:
Row compression
There are two types of row compression available:
Classic row compression
Classic row compression involves replacing repeating patterns that span multiple column values within a row with shorter symbol strings. A sampling of rows are scanned to find instances of repetitive data. From this scan, a table-level compression dictionary is created. This dictionary is used to replace the repetitive data with shorter symbol strings.

Table-level compression dictionaries are static; after they are first created, they do not change unless you rebuild them during a classic table reorganization.

Adaptive row compression
Adaptive row compression involves the use of two compression approaches: classic row compression and page-level compression. Page-level compression involves replacing repeating patterns that span multiple column values within a single page of data with shorter symbol strings. As a page fills with data, page compression logic scans the page for repetitive data. From this scan, a page-level compression dictionary is created. This dictionary is used to replace the repetitive data with shorter symbol strings.

Page-level dictionaries are dynamic; they are rebuilt automatically if necessary.

Value compression
Value compression involves removing duplicate entries for a value, storing only one copy, and keeping track of the location of any references to the stored value.

Row compression is also used for temporary tables. Compressing temporary tables reduces the amount of temporary disk space that is required for large and complex queries, thus increasing query performance. Compression for temporary tables is enabled automatically under the Db2® Storage Optimization Feature. Each temporary table that is eligible for row compression requires an additional 2 - 3 MB of memory for the creation of its compression dictionary. This memory remains allocated until the compression dictionary has been created.

You can also compress index objects, which reduces storage costs. This kind of compression is especially useful for large online transaction processing (OLTP) and data warehouse environments, where it is common to have many very large indexes. In both of these cases, index compression can significantly improve performance in I/O-bound environments while causing little or no performance degradation in CPU-bound environments.

If you enable compression on a table with an XML column, the XML data that is stored in the XDA object is also compressed. A separate compression dictionary for the XML data is stored in the XDA object. If you use Db2 Version 9.7 or later to add XML columns to a table created using Db2 Version 9.5, these XML columns are compressed. However, XML columns added to a table using Db2 Version 9.5 are not compressed; in such tables, only the data object is compressed.