Introducing DB2 9, Part 1
Data compression in DB2 9
This content is part # of # in the series: Introducing DB2 9, Part 1
This content is part of the series:Introducing DB2 9, Part 1
Stay tuned for additional content in this series.
Disk storage systems can often be the most expensive components of a database solution. For large warehouses or databases with huge volumes of data, the cost of the storage subsystem can easily exceed the combined cost of the hardware server and the data server software. Therefore, even a small reduction in the storage subsystem can result in substantial cost savings for the entire database solution.
This article introduces DB2 9 (formerly codenamed "DB2 Viper") Venom technology that compresses row data to reduce storage requirements, improve I/O efficiency, and provide quicker data access from the disk.
Data row compression - how it works
"Venom" technology in DB2 9 uses a dictionary based algorithm for compressing data records. That is, DB2 9 can compress rows in database tables by scanning tables for repetitive, duplicate data and building dictionaries that assign short, numeric keys to those repetitive entries. Text data tends to compress well because of recurring strings as well as data with lots of repeating characters, or leading or trailing blanks.
DB2 examines entire rows for repeating entries or patterns, and not just particular fields or parts of rows. Take for example the following two rows:
Table 1. Example rows
In this example, not only the repeating values (of "500") in the Dept column are compressed, but the repeating pattern (of "Plano, TX, 24355") that spans the City, State, and ZipCode columns is also compressed as a single value. Figure 1 compares how DB2 would store the row normally and in compressed formats:
Figure 1. Comparison of uncompressed and compressed data storage
The dictionary for compression/decompression lookup is stored in hidden objects in the database, occupies little space, and is cached in memory for quick access. Even for extremely large tables, the compression dictionary is typically in the order of 100 KB. There can be instances when certain data sets do not compress well or data size conditions that don't result in much compression. DB2 has intelligent algorithms to determine such scenarios and does not perform compression when it does not yield any disk-space-saving benefits.
The data row compression feature in DB2 for Linux®, UNIX®, and Windows® is similar to the compression capabilities available on DB2 for z/OS®; however, it differs from the page-level compression techniques offered by other database vendors, where a compression dictionary is built for each page in the database. By building a compression dictionary at the table rather than page level, patterns across the entire table are analyzed, generally resulting in improved disk savings with DB2.
Data row compression in DB2 can be turned on when tables are created using the COMPRESS YES option. It can also be enabled for an existing table using the ALTER TABLE command. For example:
CREATE TABLE Sales COMPRESS YES
ALTER TABLE Sales COMPRESS YES
The compression takes effect only once the table dictionary is built, which is usually during the table REORG phase.
REORG TABLE Sales
When compressing a large table, it may be useful to populate the table with a small set of "representative" or sample data first. The process of building the compression dictionary can be fairly quick using a small data set. And if the set is a good representative sample, the compression will work well even on new data that is added to the table, without DB2 having to analyze the new data. If, however, the type of data stored in the table evolves over time, the dictionary can be kept up to date using REORG.
DB2 provides the INSPECT tool in order to help you determine the compression ratio estimate for a particular table or data set. This tool collects a sample of the table data, and builds a compression dictionary from it. This dictionary is then used to test compression against the records contained in the sample. From this test, compression savings are estimated.
INSPECT ROWCOMPESTIMATE TABLE NAME Sales â¦ RESULTS KEEP <filename> db2inspf <filename> <outfile>
The output of the INSPECT tool needs to be formatted using a DB2 utility to see the results. The files are found in the db2dump directory. Sample output is illustrated below.
DATABASE: TEST VERSION : SQL09010 2005-12-01-18.104.22.168959 Action: ROWCOMPESTIMATE TABLE Schema name: RSAHUJA Table name: Sales ... Percentage of bytes saved from compression: 66 Compression dictionary size: 2176 bytes. ...
Benefits of data row compression
"Venom" compression technology is capable of storage cost savings by up to 50% or more by reducing the amount of disk space (and disk sub-system peripherals) required for storing data. The size of database logs can also be reduced since DB2 compresses user data within log records.
This technology can also improve performance in some scenarios, despite compression/decompression entailing CPU overhead. Accessing data from the disk is the slowest database operation. By storing compressed data on disk, fewer I/O operations need to be performed on the disk to retrieve or store the same amount of data. Therefore, for disk I/O-bound workloads (for instance, when the system is waiting/idling for data to be accessed from the disk), the query processing time can be noticeably improved.
Furthermore, DB2 keeps the data compressed on both disk and memory (DB2 buffer pools), thereby reducing the amount of memory consumed, and freeing it up for other database or system operations. This can further improve database performance for queries and other operations.
The amount of space savings using the data row compression feature in DB2 can vary depending on the data. Customers using beta versions of DB2 9 have reported savings in excess of 50% and up to 80% for certain large database installations. For one customer data set, a 179.9GB table using 32KB pages was reduced to only 42.5GB: a savings of 76.4%!
The following chart illustrates a few other examples of space savings with the DB2 data row compression feature on different tables:
Figure 2. Examples of space savings with DB2 data row compression
According to Sean McCown of InfoWorld:
"The new compression method translates into direct storage savings at an average ratio of 45 to 75 percent, depending on the data type. To test this, I created a 40GB table (approximately 500 million rows) with mixed numeric and text data, exported it to a text file, and then imported it into a compressed table format. The size of the compressed table was about 17.75GB -- a savings of 56 percent."
Refer to the Related topics section to link to the complete article.
Other forms of compression in DB2
Besides data row compression introduced as part of "Venom" technology, DB2 provides additional mechanisms for reducing storage requirements even further. These include:
- NULL and default value compression
- Database backup compression
- XML tag substitution
- Multidimensional clustering
NULL and default value compression
Available in DB2 for Linux, UNIX, and Windows since Version 8, with this type of compression no-disk storage is consumed for NULL values, zero length data in variable length columns and system default values.
Database backup compression
Available since Fixpak 4 of DB2 for Linux, UNIX, and Windows V8, this compression feature results in smaller backup images that not only reduce backup storage requirements, but also make it easier to move backups between systems.
XML tag substitution
The verbose nature of XML implies that XML fragments and documents typically consume a lot of disk space. DB2 9 stores XML data in a parsed hierarchical format, replacing tag names (for example, employee) with integer values (such as 4). Repeating occurrences of the same tags are assigned the same numeric values. Storing text -rich tags using integer values not only reduces space consumption but also assists with higher performance when querying data. Moreover, the XML tag parsing (replacing with integer values), like data row compression, is done under the covers and is completely transparent to users and applications.
A form of index compression has been available in DB2 for Linux, UNIX, and Windows since V8.1. Significant index space savings can be achieved through block indexes, where one key (or index entry) per thousands of records is used (rather than one key per record with traditional indexes).
Data-row compression in DB2 9 is a remarkable new feature that can result in substantial disk space and storage cost savings. Download DB2 9 and try out this feature for yourself to see how much you can save.
- Download DB2 9 to try out the features described in this article.
- Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.
- Learn about DB2 Express-C, the no-charge version of DB2 Express Edition for the community.
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Read more articles about DB2 9 on developerWorks.
- Visit the developerWorks DB2 for Linux, UNIX, and Windows resource page to read articles and tutorials, and link to documentation, support, learning resources, and more.