DB2 10.5 continues to build upon the "pillars of DB2": low operational costs, ease of development, and reliability. This release introduces column-organized table functionality to DB2 for Linux, UNIX, and Windows, announced in April. DB2 10.5 for Linux, UNIX, and Windows will help you to save time and money, deliver top performance, and capture business insight more easily through high speed analytics over large data volumes.
These benefits are largely attributable to BLU Acceleration, a collection of technology innovations from the IBM Research and Development Labs that speed up reporting and analytics. BLU Acceleration, which eliminates the need for indexes, aggregates, or time consuming database tuning to achieve top performance and storage efficiency, is initially being delivered as a fully integrated capability in DB2 10.5 for Linux, UNIX, and Windows.
This article introduces you to column-organized table functionality in DB2 10.5 and also gives you a brief overview of the other significant improvements in this release, including DB2 Text Search enhancements, SQL compatibility enhancements, DB2 pureScale enhancements, data recovery and HADR enhancements, as well as some additional features that improve application performance and reliability.
DB2 10.5 introduces compressed column-organized tables for DB2 databases. A column-organized table is a table where the data pages contain column data instead of row data. Column-organized tables facilitate the processing of analytic workloads with complex queries, such as those that include multi-table joins, grouping and aggregation, or table scans over a star schema. Column-organized and row-organized tables can coexist in the same database, schema, table space, or query.
The processing of column-organized data in DB2 10.5 is in-memory optimized, CPU optimized, and I/O optimized, and offers the following benefits:
- In-memory optimization for maximal performance. Compressed column-organized data can be stored very efficiently, which reduces the amount of memory that is needed for (and increases the speed of) query processing. With an in-memory optimized database, all of the data can be loaded into memory, and performance bottlenecks can be avoided.
- A system that is unconstrained by main memory size, in which the data being processed can be considerably larger than the RAM.
- Improved system scaling across cores.
- An enhanced caching strategy for buffer pools to substantially reduce I/O costs.
- A smaller footprint for database storage than DB2 10.
- Late decompression, the ability to operate directly on compressed data for certain operations, thereby reducing memory usage.
- Multiplied CPU power that uses single instruction, multiple data (SIMD) processing for many operations.
- Vector processing of several column data values at once, instead of individual values.
In support of this new capability, DB2 10.5 provides the following new or enhanced functionality:
- A single setting
DB2_WORKLOAD=ANALYTICS) to enable column organization, an automated specialized initial memory configuration, page and extent size configuration, space reclamation, and automatic workload management. Be sure to set the DB2_WORKLOAD registry variable to ANALYTICS prior to creating the database.
- New syntax for specifying table storage organization
on the CREATE TABLE statement (Figure 1). If you specify the ORGANIZE
BY COLUMN clause, the data is stored by column in the data pages of
the table. A given data page stores data for one column of the table.
If you specify the ORGANIZE BY ROW clause, the data is stored by row
in the data pages of the table. A given data page stores the data for
one or more rows of the table. If the ORGANIZE BY clause is not
specified, the default organization of ROW or COLUMN is determined by
the value of the new dft_table_org database configuration parameter,
which defaults to ROW.
Figure 1. CREATE TABLE syntax fragment showing the new ORGANIZE BY ROW or COLUMN clauses
- A new database configuration parameter, dft_table_org, specifies whether a user table is to be created as a column-organized table or a row-organized table if neither the ORGANIZE BY COLUMN nor the ORGANIZE BY ROW clause is specified on the CREATE TABLE statement.
- A new utility (db2convert) that converts one or all row-organized user tables in a specified database into column-organized tables. The row-organized tables remain online during processing, and the command displays statistics about the conversion for monitoring. Keep in mind that this is currently a one-way process, so be sure to perform a backup before converting your tables to column organization.
- Enhancements to the REORG TABLE command that enable you to
reclaim storage on column-organized tables by
specifying the RECLAIM EXTENTS option. If you prefer an automated
DB2_WORKLOAD=ANALYTICS. This causes a default policy to be installed and the auto_reorg database configuration parameter to be enabled so that automatic reclamation is active for all column-organized tables.
- Automated workload management, which can
significantly improve the performance and system use of workloads with
several queries running at once. To ensure that heavier workloads on
column-organized data do not overload the system when many queries are
submitted simultaneously, there is a limit on the number of
"heavyweight" queries that can execute on the database at the same
time. You can implement this limit by using the default workload
management concurrency threshold that is automatically enabled on new
databases when the value of the DB2_WORKLOAD registry variable is set
to ANALYTICS, or that can be manually enabled on existing databases.
Default query concurrency management in DB2 10.5 is implemented within the existing DB2 workload management infrastructure, and the following new default workload management objects will be created on both upgraded and newly created databases:
- A service subclass, SYSDEFAULTMANAGEDSUBCLASS, under the existing SYSDEFAULTUSERCLASS superclass, where heavyweight queries will run and can be controlled and monitored as a group.
- A CONCURRENTDBCOORDACTIVITIES threshold, SYSDEFAULTCONCURRENT, which is applied to the SYSDEFAULTMANAGEDSUBCLASS subclass to control the number of concurrently executing queries that are running in that subclass.
- A work class set, SYSDEFAULTUSERWCS, and a new work class, SYSMANAGEDQUERIES, which identify the class of heavyweight queries that are to be controlled. The SYSMANAGEDQUERIES work class encompasses queries that are classified as READ DML (an existing work type for work classes) falling above a timeron threshold that reflects heavier queries.
- A work action set, SYSDEFAULTUSERWAS, and work action, SYSMAPMANAGEDQUERIES, which map all queries that fall into the SYSMANAGEDQUERIES work class to the SYSDEFAULTMANAGEDSUBCLASS service subclass.
- Dynamic list prefetching, a new prefetching type used in query execution plans that access column-organized tables, is used to prefetch exactly those pages that will be accessed while scanning a specific portion of the table. This prefetching method maximizes the number of pages that are retrieved by asynchronous prefetching (while minimizing synchronous reads) by queuing work until the required pages have been loaded into the buffer pool.
- New DB2 explain information that is captured to help you determine how your application performs with the new column-organized table functionality. DB2 10.5 includes a new CTQ plan operator that represents the transition between column-organized data processing and row-organized data processing.
- Semi-join support for queries to reduce the amount of memory that is consumed by large inner tables. A semi-join is an optimization technique that leverages one or more predicates on the outer table and other inner tables in the query. The DB2 explain output is extended to indicate whether the inner table of a particular hash join operator is a large inner table for which the semi-join optimization technique will be applied.
- DB2 10.5 introduces new monitor elements that will
help you to tune database server workloads involving queries against
- Assessing buffer pool efficiency: A new set of monitor elements enables the monitoring of data page I/O for column-organized tables separately from that of row-organized tables.
- Measuring prefetcher efficiency: A new set of monitor elements can help you to track the volume of requests for data in column-organized tables that are being submitted to prefetchers, and the number of pages that prefetchers have skipped reading because the pages were already in memory.
- Measuring column data size: A column-organized table is associated with a new table object where the column data is stored. New monitor elements help you to estimate the size of the column data.
- Measuring time spent: New time-spent monitor elements provide information about how the DB2 database manager is spending time processing column-organized tables.
For complete details about any of this new function, see the DB2 10.5 for Linux, UNIX, and Windows Information Center.
Column-organized tables are supported only on Linux (x86-x64, Intel and AMD processors) and AIX (POWER processors). Column-organized table processing is offered in DB2 Advanced Workgroup Server Edition, DB2 Advanced Enterprise Server Edition, and DB2 Developer Edition.
DB2 Text Search enhancements
You can use DB2 Text Search to quickly retrieve text data that is stored in a DB2 table. By incorporating Text Search functions in your queries, you can create powerful and versatile text-retrieval applications.
In DB2 10.5, DB2 Text Search includes the following improvements:
- Reduced impact of indexing on search.
- Increased support for multilingual collections.
- Support for embedded documents, archive files, and compressed files.
- Enhanced configuration capabilities, including the ability to define system-level defaults.
- The new db2ts SET COMMAND LOCKS FOR TEXT command gives you the ability to manually set the command lock for a specific text search index in your database, which is useful for preventing conflicts when an index might be affected by operations that are not managed through DB2 Text Search.
- Two new text search index configuration options, INITIALMODE and LOGTYPE, help to control update processing. These options are specified on the db2ts CREATE INDEX FOR TEXT command. Use the INITIALMODE option to run the initial update when the index is created. The LOGTYPE option determines how the log table is managed. The db2ts CREATE INDEX FOR TEXT command creates a text search index for a text column. You can search the column data by using text search functions.
- More options for finer control of update processing: You can now specify the batch size for a COMMIT operation and how many commit cycles are to be completed during one update session. You can also specify whether the commit size is to be based on the number of rows processed or the number of hours passed.
SQL compatibility enhancements
DB2 10.5 contains enhancements that make it easier to work with the DB2 product if you are more familiar with other relational database products. The following enhancements, which build upon a long list of SQL compatibility enhancements that were released with previous versions of DB2 for Linux, UNIX, and Windows, reduce the time and complexity of enabling some applications to run in a DB2 environment:
- Extended row size support: You can now create a table whose rows can exceed the maximum row length for the page size of the table space. Use extended row size support to migrate tables whose row size exceeds 32 KB to DB2 10.5, or to create tables with more VARCHAR or VARGRAPHIC columns. You can alter existing tables to take advantage of extended row size support, which is enabled by setting the extended_row_sz database configuration parameter to ENABLE. During a data insert operation, if the data exceeds the maximum row length, a subset of VARCHAR or VARGRAPHIC data is moved out of the row and stored as LOB data.
- Exclude null keys support: You can use the new EXCLUDE NULL KEYS clause on the CREATE INDEX statement to prevent keys from being inserted into the index object when all columns or expressions of the key have the null value, thereby reducing the size of sparse indexes and enhancing performance when you do not want queries to access data that is associated with null keys. During unique index enforcement, rows in which the index key is null are ignored.
DB2 pureScale enhancements
The DB2 pureScale feature is a clustering technology that enables you to scale your system with near-linear efficiency and predictable query performance without having to modify your applications or change how your data is distributed.
In DB2 10.5, the DB2 pureScale feature includes the following enhancements:
- Improved availability: You can add DB2 members online without any down time. If you add members offline, you no longer need to perform a database backup operation before the database can be used again. In addition, starting with DB2 10.5, when you apply an online fix pack update, you can update one member or cluster caching facility (CF) at a time while the remaining members and CFs continue to process transactions, thereby maintaining continuous availability.
- Customized workload balancing (also supported in a partitioned database environment): You can now achieve member-level workload management by defining member subsets that enable you to isolate application workloads to a specific set of members.
- Easier recovery: DB2 pureScale environments now support DB2 high availability disaster recovery (HADR), combining the continuous availability features of the former with the robust disaster recovery capabilities of the latter. You can now also restore an offline backup image taken from an IBM DB2 10.5 Enterprise Server Edition (ESE) instance to a DB2 pureScale instance, or an image taken from a DB2 pureScale instance to an ESE instance. You can restore a database backup image to a DB2 pureScale instance with a different topology, or restore online database or table space backup images that were taken on a DB2 pureScale instance to a DB2 pureScale instance with a superset topology. This restore support includes rollforward operations through member addition events.
- Enhanced table reorganization: Inplace (online) table reorganization is now supported in a DB2 pureScale environment.
- Enhanced performance through member-specific STMM: Self-tuning memory management in DB2 pureScale environments is enhanced by having an independent STMM tuner on each member. These tuners can respond appropriately to changes in available memory on each member, and the tuning can more closely reflect the characteristics of the workload.
- Reduced index page contention: Random ordering on index key columns helps to alleviate page contention on frequently accessed pages during certain insert operations. This new capability is particularly beneficial in a DB2 pureScale environment when index leaf pages are accessed frequently.
- Simplified root login settings: You can now install and configure the DB2 pureScale feature without enabling remote root login and passwordless SSH, thereby eliminating a number of security concerns.
- Simplified fix pack installation: You can use the DB2 installFixPack command to update both DB2 pureScale instance and DB2 ESE instance types in a single-step process.
Data recovery and HADR enhancements
- DB2 Advanced Copy Services interface facilitates the use of
customized scripts: Customized scripts that implement
snapshot backup and restore operations can now be more seamlessly
integrated with the DB2 data server, thereby reducing user error and
improving the monitoring and management of these snapshot operations.
Using the fast copy technology of a storage device, such as that offered by DB2 Advanced Copy Services (ACS), makes the data copy part of backup and restore operations run much faster than would be the case with a traditional backup or restore operation. A backup operation that uses DB2 ACS is referred to as a snapshot backup operation, and a restore operation that uses DB2 ACS is referred to as a snapshot restore operation. To perform snapshot backup and restore operations, you need a DB2 ACS API driver for your storage device or, for storage devices that are not supported, a custom script that enables your storage device to perform snapshot operations. Although such scripts provide flexibility with respect to the storage devices that you can use, they can be error prone and cannot generate history file entries.
In DB2 10.5, the DB2 ACS API is wrapped in the DB2 ACS library. Formerly error prone actions, such as issuing the SET WRITE SUSPEND and SET WRITE RESUME commands at the correct time, are handled by the DB2 database manager. Moreover, entries for every snapshot operation are written to the recovery history file, which enables you to monitor the final status of these operations.
- Improved HADR monitoring: The following new fields
are returned by the MON_GET_HADR table function and the db2pd command
(when you specify the -hadr option):
- HEARTBEAT_MISSED, the number of heartbeat messages not received on time on this log stream since database startup on the local member.
- HEARTBEAT_EXPECTED, the number of heartbeat messages expected on this log stream, since database startup on the local member; comparing this value to the value of HEARTBEAT_MISSED gives you a measure of network health during a particular time interval.
- STANDBY_SPOOL_PERCENT, the percentage of spool space used in relation to the configured spool limit; an indicator of how much HADR log spooling space is being used.
- STANDBY_ERROR_TIME, the last time at which the standby encountered a major error.
Additional features that improve application performance and reliability
- Support for NOT ENFORCED primary key and unique constraints, which enables you to avoid performance costs and space requirements when it is known that the data already conforms to the constraint. These informational constraints help the query optimizer to select optimal data access plans when index access to the data provides no additional benefit.
- Expression-based indexes: You can now create an index that contains a key based on an expression, not data that is stored in the table, and the results of the expression are stored in the index. The performance of queries whose predicates contain expressions is improved if the query optimizer chooses an index that contains keys that are based on the same expressions.
- DB2 10.5 includes a number of IBM data server clients and
drivers enhancements. To take advantage of these
improvements, you must upgrade to a Version 10.5 IBM data server
client or driver. These enhancements pertain to the following items:
- Common client and driver
- Call level interface (CLI) driver
- IBM Data Server Provider for .NET
- IBM Data Server Driver for JDBC and SQLJ
For complete details, see the DB2 10.5 for Linux, UNIX, and Windows Information Center.
A rich tool environment makes DB2 10.5 even easier to use
Tools that support database administration and application development can really enhance your performance gains and overall productivity.
- IBM Data Studio is included in every edition of DB2 10.5. Data Studio incorporates advanced development, management, and monitoring tools into a single Eclipse-based offering.
- DB2 Advanced Enterprise Server Edition includes IBM InfoSphere Data Architect, which you can use to discover, model, visualize, relate, and standardize diverse and distributed data assets across your enterprise.
- InfoSphere Optim Performance Manager has a number of enhancements to support DB2 pureScale.
- InfoSphere Optim Query Workload Tuner is included to provide expert recommendations for maximizing application performance. The Workload Table Organization Advisor in Optim Query Workload Tuner 4.1 examines the tables that are referenced in a query workload and makes recommendations that are intended to improve the performance of the query workload. The advisor identifies the tables that are recommended for conversion from row to column organization.
- InfoSphere Optim Configuration Manager is included to offer centralized management of database and client configuration.
DB2 10.5 delivers a large number of enhancements and innovation to help save you time and money and achieve better performance. Advantages of the column-organized functionality in DB2 10.5, with its industry-leading compression and large performance gains for analytic queries over large data volumes, include significantly reduced time-to-value and increased consumability, which can be achieved through minimal DBA design requirements and reduced query tuning and debugging efforts.
- Refer to the best practices for DB2 for Linux, UNIX, and Windows.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools as well as IT industry trends.
- Follow developerWorks on Twitter.
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Get products and technologies
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
- Participate in the discussion forum.
- Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.