The following blog entry was written by Jessica Rockwood from IBM Lab at Toronto, Canada.
IBM BLU Acceleration is a new technology in DB2 for analytic queries. The set of technologies encompass CPU-, memory-, and I/O-optimization with unique runtime handling and memory management and unique encoding for speed and compression. BLU Acceleration is a combination of complementary innovations from IBM that simplifies and speeds up analytic workloads. More importantly, BLU Acceleration is easy to set up and is self-optimizing which means a lot less work for the DBA to deploy and manage the solution. BLU Acceleration is introduced in DB2 for Linux, UNIX, and Windows Version 10.5 (DB2 10.5).
BLU Acceleration is a fully integrated feature of DB2 and does not require any SQL or schema changes to implement. In fact, BLU tables can coexist with tradition row tables using the same table spaces and buffer pools. With no need for database tuning, indexes or aggregates, the DBA can just load the data and go.
To understand the benefits of BLU Acceleration, it is important to review its complementary technologies. Dynamic in-memory columnar technologies provide a very efficient method to scan and find relevant data. Those technologies are then combined with innovations such as parallel vector processing and actionable compression to make analytic queries far faster.
Dynamic in-memory columnar technology
Database tables that use BLU acceleration are stored differently than the traditional row store table that you know from previous versions of DB2. With BLU Acceleration, a table can now be created as a column organized table. When data is stored in this fashion, only the columns necessary to satisfy a query are scanned which can significantly reduce the time required to locate the most relevant data.
BLU Acceleration is in-memory optimized, but not limited by the available memory. Although BLU Acceleration is optimized for accessing data in RAM, performance will no suffer as data size grows beyond RAM due to the other technological innovations described in the following sections. In addition, BLU Acceleration dynamically moves unused data to storage ensuring an efficient use of memory resources as only the most relevant data is kept in-memory.
There are a number of benefits to actionable compression. First benefit is storage savings. BLU Acceleration uses approximate Huffman encoding, prefix compression, and offset compression. Approximate Huffman encoding is a frequency-based compression that uses the fewest number of bits to represent the most common values so the most common values are compressed the most. As a result, clients have been reporting compression rates with BLU acceleration greater than that experienced with static or adaptive compression of row tables. Combine the improved compression of data with no need for indexes or aggregates and the overall storage requirements for the database are dramatically reduced.
The second benefit is that actionable compression is order-preserving on each column which means that data can be analyzed while compressed. This results in a dramatic I/O, memory, and CPU savings. Encoded values (the compressed data) can be packed into a CPU register, optimizing each CPU cycle to process the largest amount of data. Less I/O and fewer CPU cycles leads to dramatic improvements in processing time.
A further benefit of actionable compression is that the data is not decoded (or uncompressed) when processing SQL predicates (that is, =, <, >, >=, <=, BETWEEN, and son on), joins, aggregations, and more. Rather, the decompression occurs only when absolutely necessary, for example, when the result set is returned to the user.
Parallel vector processing
Building on actionable compression and the ability to pack each CPU register with a maximum amount of data to process, parallel vector processing adds yet another level of CPU-optimization.
Multiple data parallelism is achieved through the use of special hardware instructions known as Single Instruction Multiple Data (SIMD). Through the use of SIMD, it is possible to work on multiple data elements with a single instruction. The CPU register is packed with more data (thanks to actionable compression) and with SIMD, all the data is processed with a single instruction.
Multi-core parallelism is the other aspect of parallel vector processing. BLU Acceleration is designed to take advantage of the available CPU cores and drive multi-core parallelism for the queries it processes. Queries on column-organized table (BLU tables) are automatically parallelized to optimize performance.
One final BLU Acceleration technology to be aware of is data skipping. Data skipping allows BLU Acceleration to skip unnecessary processing or irrelevant or duplicate data. If BLU Acceleration automatically detects a large section of data that does not qualify for a query, the data is ignored and skipped. This leads to order of magnitude savings for I/O, memory, and CPU. There is no DBA action required to enable data skipping. Rather, BLU Acceleration automatically maintains a persistent store of min and max values, called a synopsis table, for sections of data values. The automatic maintenance occurs during load, insert, update, and delete of data. When a query accesses a given column, BLU Acceleration uses the synopsis table to leverage data skipping wherever possible.
For more detail about IBM BLU Acceleration, see Leveraging DB2 10 for High Performance of Your Data Warehouse.