Save energy with the DB2 10.1 for Linux, UNIX, and Windows data compression feature

Reduce maintenance and make your database greener

The IBM® DB2® for Linux®, UNIX®, and Windows® data compression feature allows you to store your data in a compact form. There are two known benefits of this approach: first, reduction of storage space, and second, improvement of performance. In this article we describe a case study showing the third benefit: reduction of electricity consumption per unit of work. As a result, data compression reduces the cost of database maintenance and makes your database "greener."

Share:

Andriy Miranskyy (andriy@ca.ibm.com), Software engineer, IBM

Photo of author Andriy MiranskyyDr. Andriy Miranskyy is a software engineer in the IBM Information Management division at the IBM Toronto Software Laboratory. His work and research interests are in the area of mitigating risk in software engineering, focusing on software quality assurance, program comprehension, software requirements, software architecture, project risk management, and Green IT. Andriy received his Ph.D. in Applied Mathematics at the University of Western Ontario. He has 15 years of software engineering experience in information management and pharmaceutical industries. Andriy has numerous publications and patents in the field of software engineering.



Sedef Akinli Kocak (sedef.akinlikocak@ryerson.ca), PhD student, Ryerson University

Sedef Kocak photoSedef Akinli Kocak is a PhD student in the Environmental Applied Science and Management Program at Ryerson University. Her research interests include information technologies and environmental sustainability, sustainable software, green IT, and green software development. Akinli Kocak has an MSc from the University of Maine and has an MBA from Ankara University. Contact her at sedef.akinlikocak@ryerson.ca



Enzo Cialini (ecialini@ca.ibm.com), Senior Technical Staff Member, IBM

Enzo Cialini photoEnzo Cialini is a Senior Technical Staff Member in the IBM Information Management Software division at the IBM Toronto Laboratory. He is currently the Chief Quality Assurance Architect responsible for management and technical test strategy for DB2 LUW Engine (Warehouse and OLTP) and Customer Operational Profiling. Enzo joined IBM in 1991 and has been working on the DB2 development team since 1992. He has more than 20 years experience in software development, testing and support. Enzo is actively engaged with customers and the field in OLTP and Warehouse deployments, has authored The High Availability Guide for DB2 and has numerous patents and publications on DB2 and Integration.



Dr. Ayse Basar Bener, Professor of Engineering, Ryerson University

Ayse                 Bener photoDr. Ayse Basar Bener is a professor in the Department of Mechanical and Industrial Engineering at Ryerson University. Her research interests are in building intelligent models for sustainable decision making under uncertainty. She has published more than 100 articles in journals and conferences. Dr. Bener holds a PhD in Information Systems from London School of Economics. She is a member of AAAI, IEEE, and AIS.



07 February 2013

Also available in Russian Spanish

Introduction

Disk storage systems can often be the most expensive components of a database solution. For large warehouses or databases with huge volumes of data, the cost of the storage subsystem can easily exceed the combined cost of the hardware server and the data server software. Therefore, even a small reduction in the storage subsystem can result in substantial cost savings for the entire database solution. Data compression reduces storage requirements, improves I/O efficiency, and provides quicker access to the data from the disk.

The latest release of DB2, version 10.1, introduced a new type of data compression: adaptive data compression. This feature utilizes a number of compression techniques, including table-wide and page-wide compression. (See the Resources section for a link to the data compression section of the DB2 documentation.) These compression techniques lead to significant reduction of storage space. However using this feature can lead to CPU overhead associated with compression and decompression of the data.

Another positive side effect of this technology is speeding up of I/O-intensive workloads (in spite of CPU overhead). Reading data from disk to memory for processing is one of the slowest database operations. Storage of compressed data on disk leads to fewer I/O operations needed to retrieve or store the same amount of data (in comparison with the uncompressed dataset). Therefore, for disk I/O-bound workloads (for instance, when the system is waiting or idling for data to be accessed from the disk), the query processing time can be noticeably improved. Furthermore, DB2 processes buffered data in memory in its compressed form, thereby reducing the amount of memory consumed compared to uncompressed data. This has the effect of immediate increasing the amount of memory available for the database without having to increase physical memory capacity. This can further improve database performance for queries and other operations.

Compression can be turned on when tables are created using the COMPRESS YES option. Alternatively, the administrator can enable compression of an existing table T by executing the SQL ALTER TABLE T COMPRESS YES. You can find additional details about using data compression feature in the developerWorks article Optimize storage with deep compression in DB2 10.

How does the compression feature affect energy consumption for a given workload? In order to answer this question we have analyzed the total energy consumption of the software when processing a specific workload in two different scenarios:

  • Without the data compression feature
  • With the adaptive data compression feature

We have found that the compression feature increases average energy consumption per unit of time, but decreases average consumption per statement due to significant reduction of execution time. For our workload under study this led to a 34% reduction of energy consumption per unit of work. Details of the study are given below.

Workload description and case study setup

For our case study we measured the effects of two actions on the amount of resources (time and electricity) needed to complete a certain workload:

  • Using no compression of data
  • Using adaptive compression of data

Our reference workload is the TPC-H standard benchmark. This is created by the Transaction Processing Performance Council and is used as the industry standard for measuring database performance. The workload consists of a set of business-oriented ad-hoc queries. The database has been designed to have broad industry-wide relevance. See the TPC-H specification document for details.

Using the tools provided with the workload, we populated the database with 1GB of raw data and generated 240 distinct queries associated with this dataset. The queries were executed sequentially for approximately two hours in a circular fashion on a Lenovo ThinkPad T60 laptop with 3GB of RAM. A two hour interval was selected to reduce measurement errors associated with low precision (0.01kWh) of the energy meter (UPM EM100). Note: Some of the statements consume significant amount of time. For example, a query started at 1:59 can take 5 minutes, finishing at 2:04.

We counted the number of statements executed in a given time interval and measured the amount of energy consumed by DB2 running in the the two configurations: without compression, and with adaptive compression.

The workload was executed against each configuration three times to estimate measurement error (expressed using relative standard error). Note: Relative standard error is calculated as the sample estimate of the population (in this case 3 workload runs) standard deviation divided by the square root of the sample size and by the population mean.


Benchmark results

Table 1 shows the measurement results. First, let us look at the average execution time per statement. The slowest configuration of the database is, as expected, the one with the compression feature disabled. We treated this configuration as a baseline. Compression of data reduces the size of the tables by 61% and leads to 97% performance improvement in comparison with the reference configuration.

The compression feature increases the overall energy consumption by 29% (in comparison with the reference configuration). However, energy consumption per unit of work is reduced by 34%, due to increased query throughput.

Based on these data, we conclude that enabling adaptive data compression has a mixed effect on energy consumption: consumption per unit of time goes up, but consumption per unit of work goes down. Theoretically, if the workload size is fixed, hardware is dedicated only to this workload, performance of the workload is not critical, and space savings are irrelevant, it may be beneficial to run queries against uncompressed version of the database. These requirements are rarely met in practice: workloads scale up, hardware is shared between multiple tasks (especially in the virtualized or cloud computing environments), and space saving is critical.

Table 1. Workload and database characteristics with and without adaptive compression feature (± denotes relative standard error)
Database featureAverage statement count per hourAverage electricity consumption (kWh)Space consumption (MB)Header 2Compression ratio (see note 1 below)Watts per statement per second (see note 2 below)
No compression415 ± 1.8%0.035 ± 0.5%1168.2100%302
Adaptive compression814 ± 4.0%0.045 ± 0.0%455.661%199

Note 1: Compression ratio is defined as 1 – compressed size / uncompressed size.

Note 2: This metric is equivalent to TPC “power per performance throughput” metric measured in Watt per transaction per second (W/tps) and is calculated as “Energy consumption” / “Work completed”. See TPC Energy Specification for details.


Threats to validity

This case study shows that the method can be successfully applied to a particular dataset. Following the paradigm of the ‘‘representative’’ or ‘‘typical’’ case, this suggests that the same approach may be extrapolated to other products.

Our test system is not designed for use in a production environment. It is a laptop (tuned to minimize electricity consumption, sacrificing efficacy) with consumer-grade operating system. However, we believe that the results can be extrapolated to a production system, since data compression is system-agnostic.

The amount of savings for the production system will vary with system's setup and with associated workload. The savings should be more pronounced for a large system. For example, if an uncompressed system requires 100 hard drives to store the data and a compressed one requires 50, then you can immediately save 50% of energy consumed by your storage system, turning unused hard drives off.


Summary

Our case study shows that, in addition to space saving and performance improvement, DB2 adaptive compression feature can reduce energy consumption per unit of work. This leads to reduction of maintenance cost and makes your database “greener”.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=857150
ArticleTitle=Save energy with the DB2 10.1 for Linux, UNIX, and Windows data compression feature
publish-date=02072013