The availability of PTF 5 was announced in my previous blog entry.
This blog focuses on the new features that have been introduced and describes them in more detail. Additionally this Db2 Analytics Accelerator V5.1 PTF 5 Technical Update.pdfView Details document provides more technical details.
Supporting multiple-row insert for Accelerator-only tables (AOTs)
Now, with multiple-row inserts, you can populate AOTs much faster than before, when only single-row inserts were supported. A multiple-row insert is performed by specifying an array of values per column in the VALUES clause of an INSERT statement whereas the dimension of the array determines the number of rows to be inserted. This can be done through JDBC, ODBC or local SQL application. See the Db2 Knowledge Center for the definition of a multiple-row insert.
The following example outlines how to perform a multiple-row insert from a JDBC program. The array of values per column is constructed using the setString and addBatch methods. The number of addBatch calls determines the number of rows to be inserted. Afterwards all rows are inserted at once by calling the executeBatch method.
Use multiple-row inserts and benefit from the fast performance if you want to populate AOTs with data from external sources using INSERT INTO VALUES statements. ‘External sources’ is defined as any data that is not available on the Accelerator yet. If the data that you want to populate to AOTs is already available on the Accelerator, you would use an INSERT statement with a fullselect instead, also called INSERT INTO SELECT statement.
As an example, consider an ETL process that runs outside of Db2 for z/OS and that inserts results into AOTs on the Accelerator for further analytic processing. In addition, analytic tools such as IBM Campaign or IBM InfoSphere DataStage will benefit from the faster inserts for any streams that process data within the tools’ engine and insert the result into AOTs for further analytic processing or reporting.
Add column support for non-replicated tables
After a column has been added to a Db2 for z/OS table it can now be added to the associated table on the Accelerator by executing a simple synchronization operation. There is no longer a need to remove the table from the Accelerator, re-add it and re-load it again, which can be a time-consuming action and at the same time prevents SQL queries using the added column from being routed to the Accelerator until the table is re-loaded. With PTF 5 the add column support is available for Accelerator-shadow tables that are not enabled for replication and for Accelerator-archive tables. The add column support is not currently available for Accelerator-shadow tables that are enabled for replication.
The following ALTER TABLE ADD COLUMN clauses are supported:
- ALTER TABLE ADD COLUMN (adding a nullable column)
- ALTER TABLE ADD COLUMN ... NOT NULL WITH DEFAULT (adding a non-nullable column with an explicit default value (except hexadecimal character strings))
The new stored procedure ACCEL_SYNCHRONIZE_SCHEMA can be used to synchronize the new schema of one or more tables and alter the table(s) on the Accelerator appropriately. Alternatively, new controls in Db2 Analytics Accelerator Studio can be used to synchronize the schema of one or more tables.
The term federation or federated access describes the ability to run accelerated queries against tables that do not belong to or originate from the Db2 subsystem that issues the query. That said the federation support introduced in PTF 5 enables read access to all data in one Accelerator through a single Db2 subsystem.
Normally, you can only accelerate queries if a Db2 subsystem is paired with an Accelerator and if the tables to be queried exist in that Db2 subsystem. This can be regarded as a security feature; a second Db2 subsystem paired with the same Accelerator cannot access tables in the first Db2 subsystem and vice versa.
Geographical boundaries translate into a segmentation of the table data by region. For example, a company does business in the U.S. and in Asia; the tables in subsystem A contain the sales figures for the U.S.; the tables in subsystem B the sales figures for Asia.
If data is segmented by areas of business, the organization of the database follows a company's key business areas or its product portfolio. Assume that an organization runs a chain of gas stations and also produces cosmetics (both business areas are based on oil). Subsystem A contains the data of the gas station business; subsystem B the data of the cosmetics business.
Frequently, the aim of analytical queries is to capture trends across regions or business areas. In accordance with the previous examples, one would like to find out how the business in the U.S. develops against the business in Asia. What are the growth rates in the gas station business as opposed to those in the cosmetics sector?
Queries that answer these questions require access to different data sources. If the data is segmented by the organization of the database, access must be extended to tables in other Db2 subsystems, and this is what the federation feature is about.
Another example would be the following: One Db2 subsystem represents a data lake that is maintained by IBM Db2 Analytics Accelerator Loader from sources external to Db2 for z/OS. Then one or more Db2 subsystems could use federated access to reference data from the data lake to combine this data with its own data for analytic purposes.
An access extension to tables in another Db2 subsystem for federated access is achieved through the following steps:
1. A user with sufficient rights on the Db2 subsystem that owns the tables (remote Db2 subsystem or owning Db2 subsystem) grants the right to access a set of tables to a Db2 subsystem that normally cannot access these tables (called the referencing Db2 subsystem) using one of the following methods:
- New stored procedure ACCEL_GRANT_TABLES_REFERENCE
- New control in Db2 Analytics Accelerator Studio: Granting access to remote tables
2. A user with sufficient rights on the referencing Db2 subsystem creates references to tables for that access rights were granted on the remote Db2 subsystem using one of the following methods:
- New stored procedure ACCEL_CREATE_REFERENCE_TABLES
- New control in Db2 Analytics Accelerator Studio: Creating reference tables
These references are created as accelerator-only tables with corresponding entries in the catalog of the referencing Db2 subsystem. On the shared Accelerator, these referencing accelerator-only tables point to the referenced tables belonging to the remote Db2 subsystem via synonyms.
3. Queries from the referencing Db2 subsystem can now access these referencing accelerator-only tables.
The federation feature supports workload balancing and high availability in an environment where the owning Db2 subsystem and one or more referencing Db2 subsystems that require federated access are connected to multiple Accelerators. All of these Accelerators have loaded the same set of tables of the owning Db2 subsystem. After access has been granted to the same tables on all Accelerators, the referencing Db2 subsystems can create referencing AOTs where each AOT points to the according referenced tables on all Accelerators. Now, if a referencing Db2 subsystem executes a query that uses a referencing AOT, Db2 can do workload balancing for this query and route it to the one or other Accelerator.