Disk storage systems can often be the most expensive components of a database solution. For large warehouses or databases with huge volumes of data, the cost of the storage subsystem can easily exceed the combined cost of the hardware server and the data server software. Therefore, even a small reduction in the storage subsystem can result in substantial cost savings for the entire database solution. Data compression reduces storage requirements, improves I/O efficiency, and provides quicker access to the data from the disk.
The latest release of DB2, version 10.1, introduced a new type of data compression: adaptive data compression. This feature utilizes a number of compression techniques, including table-wide and page-wide compression. (See the Resources section for a link to the data compression section of the DB2 documentation.) These compression techniques lead to significant reduction of storage space. However using this feature can lead to CPU overhead associated with compression and decompression of the data.
Another positive side effect of this technology is speeding up of I/O-intensive workloads (in spite of CPU overhead). Reading data from disk to memory for processing is one of the slowest database operations. Storage of compressed data on disk leads to fewer I/O operations needed to retrieve or store the same amount of data (in comparison with the uncompressed dataset). Therefore, for disk I/O-bound workloads (for instance, when the system is waiting or idling for data to be accessed from the disk), the query processing time can be noticeably improved. Furthermore, DB2 processes buffered data in memory in its compressed form, thereby reducing the amount of memory consumed compared to uncompressed data. This has the effect of immediate increasing the amount of memory available for the database without having to increase physical memory capacity. This can further improve database performance for queries and other operations.
Compression can be turned on when tables are created using the COMPRESS YES
option. Alternatively, the administrator can enable compression of an
existing table T by executing the SQL
ALTER TABLE T COMPRESS YES. You can find additional details about using data
compression feature in the developerWorks article Optimize storage with
deep compression in DB2 10.
How does the compression feature affect energy consumption for a given workload? In order to answer this question we have analyzed the total energy consumption of the software when processing a specific workload in two different scenarios:
- Without the data compression feature
- With the adaptive data compression feature
We have found that the compression feature increases average energy consumption per unit of time, but decreases average consumption per statement due to significant reduction of execution time. For our workload under study this led to a 34% reduction of energy consumption per unit of work. Details of the study are given below.
Workload description and case study setup
For our case study we measured the effects of two actions on the amount of resources (time and electricity) needed to complete a certain workload:
- Using no compression of data
- Using adaptive compression of data
Our reference workload is the TPC-H standard benchmark. This is created by the Transaction Processing Performance Council and is used as the industry standard for measuring database performance. The workload consists of a set of business-oriented ad-hoc queries. The database has been designed to have broad industry-wide relevance. See the TPC-H specification document for details.
Using the tools provided with the workload, we populated the database with 1GB of raw data and generated 240 distinct queries associated with this dataset. The queries were executed sequentially for approximately two hours in a circular fashion on a Lenovo ThinkPad T60 laptop with 3GB of RAM. A two hour interval was selected to reduce measurement errors associated with low precision (0.01kWh) of the energy meter (UPM EM100). Note: Some of the statements consume significant amount of time. For example, a query started at 1:59 can take 5 minutes, finishing at 2:04.
We counted the number of statements executed in a given time interval and measured the amount of energy consumed by DB2 running in the the two configurations: without compression, and with adaptive compression.
The workload was executed against each configuration three times to estimate measurement error (expressed using relative standard error). Note: Relative standard error is calculated as the sample estimate of the population (in this case 3 workload runs) standard deviation divided by the square root of the sample size and by the population mean.
Table 1 shows the measurement results. First, let us look at the average execution time per statement. The slowest configuration of the database is, as expected, the one with the compression feature disabled. We treated this configuration as a baseline. Compression of data reduces the size of the tables by 61% and leads to 97% performance improvement in comparison with the reference configuration.
The compression feature increases the overall energy consumption by 29% (in comparison with the reference configuration). However, energy consumption per unit of work is reduced by 34%, due to increased query throughput.
Based on these data, we conclude that enabling adaptive data compression has a mixed effect on energy consumption: consumption per unit of time goes up, but consumption per unit of work goes down. Theoretically, if the workload size is fixed, hardware is dedicated only to this workload, performance of the workload is not critical, and space savings are irrelevant, it may be beneficial to run queries against uncompressed version of the database. These requirements are rarely met in practice: workloads scale up, hardware is shared between multiple tasks (especially in the virtualized or cloud computing environments), and space saving is critical.
Table 1. Workload and database characteristics with and without adaptive compression feature (± denotes relative standard error)
|Database feature||Average statement count per hour||Average electricity consumption (kWh)||Space consumption (MB)Header 2||Compression ratio (see note 1 below)||Watts per statement per second (see note 2 below)|
|No compression||415 ± 1.8%||0.035 ± 0.5%||1168.2||100%||302|
|Adaptive compression||814 ± 4.0%||0.045 ± 0.0%||455.6||61%||199|
Note 1: Compression ratio is defined as 1 – compressed size / uncompressed size.
Note 2: This metric is equivalent to TPC “power per performance throughput” metric measured in Watt per transaction per second (W/tps) and is calculated as “Energy consumption” / “Work completed”. See TPC Energy Specification for details.
Threats to validity
This case study shows that the method can be successfully applied to a particular dataset. Following the paradigm of the ‘‘representative’’ or ‘‘typical’’ case, this suggests that the same approach may be extrapolated to other products.
Our test system is not designed for use in a production environment. It is a laptop (tuned to minimize electricity consumption, sacrificing efficacy) with consumer-grade operating system. However, we believe that the results can be extrapolated to a production system, since data compression is system-agnostic.
The amount of savings for the production system will vary with system's setup and with associated workload. The savings should be more pronounced for a large system. For example, if an uncompressed system requires 100 hard drives to store the data and a compressed one requires 50, then you can immediately save 50% of energy consumed by your storage system, turning unused hard drives off.
Our case study shows that, in addition to space saving and performance improvement, DB2 adaptive compression feature can reduce energy consumption per unit of work. This leads to reduction of maintenance cost and makes your database “greener”.
- Learn more about data compression in the Data compression and performance section of the DB2 for Linux, UNIX, and Windows Information Center.
- Get more information on using deep compression to minimize storage space and improve database performance in the article "Optimize storage with deep compression in DB2 10" (developerWorks, May 2012).
- Get information about the TCP-H standard benchmark from the TPC-H page at the Transaction Processing Performance Council (TPC) website.
- Learn about the Transaction Processing Performance Council (TPC), a non-profit corporation founded to define transaction processing and database benchmarks.
- Access the TCP Benchmark H (Decision Support) Standard Specification.
- Access the TCP Energy Specification.
- Follow developerWorks on Twitter.
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
- Visit the developerWorks resource page for DB2 for Linux, UNIX, and Windows to read articles and tutorials and connect to other resources to expand your DB2 skills.
- Expand your knowledge by reading more articles in IBM Data magazine.
Get products and technologies
- Download a trial version of DB2 for Linux, UNIX, and Windows.
- Participate in the discussion forum.
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.