July 27, 2015 | Written by: MICHAEL KWOK
Share this post:
IBM dashDB™ Enterprise MPP is a high performance, massively scalable cloud data warehouse service, fully managed by IBM. dashDB MPP enables simple and speedy information management, analytics and business intelligence operations in the cloud. One of its key features is making data small in size – with state-of-the-art compression technology, dashDB MPP can deliver an impressive storage saving to maximize both business value as well as query performance.
dashDB MPP is built on an innovative columnar technology. One of the biggest innovations is its ability to compress data at a very high rate. Two factors that contribute to this high compress rate are the nature of native column organization and the principle of “like data compresses better than unlike data.” If you think about it, a column represents a particular data type such as an item name, or an item price. All the values of the column are of the same data type, typically a string or a number, and may even be further constrained by range (e.g., the item prices may be within a range of 9.99 – 19.99), possibly with many duplicates, or similar-looking pieces of data. Contrast this to trying to compress a row in a row-based database, which can have many different data types, patterns and an arbitrarily large number of columns. All of this makes compression more difficult. On top of this, dashDB’s sophisticated algorithms are datatype-sensitive.
The compression technology in dashDB MPP optimizes compression based on the frequency of data. That is, more commonly repeating data values are compressed more tightly. For example, a more common last name like “Smith” will be compressed more tightly than uncommon last names. Moreover, the compressed values are packed as tightly as possible in a collection of bits to best fit in the register width of the CPU. dashDB MPP can compress a column value as low as 1 bit!
In internal testing, we have observed that dashDB MPP can compress a representative BI database by a factor of 10 from the pre-loaded data size, which is 2 times better than another major cloud database service we have evaluated*.
Improve Memory Utilization
In addition to the storage saving, dashDB MPP can store data in its bufferpool (i.e., memory) in a compressed format; this fits more data into the same amount of memory, significantly increasing data density in memory, and improving query performance.
dashDB MPP’s state-of-the-art compression technology in dashDB MPP enables “actionable compression;” in other words, many analytical operations (such as predicate evaluation, joins and aggregates) can be performed on the compressed data. Imagine how this can save CPU cycles and further speed up your query processing.
IBM dashDB™ Enterprise MPP is truly built for Big Data – speedy, scalable, and small.
* Disclaimer: Performance and compression data is based on measurements and projections using IBM benchmarks in a controlled environment. The actual throughput, performance or compression that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.