With the introduction of the IBM Db2 Analytics Accelerator (Accelerator) came a new era of analytics on IBM Z. No longer was it necessary to move large amounts of data off the Z platform in order to do analytics. Queries that used to run for hours or even days in Db2 now run in minutes or even seconds on the Accelerator.
However, many companies have data spread across multiple Db2 subsystems in production. Therefore federating data, even with the Db2 Analytics Accelerator, still required unloading and joining data from multiple systems and then loading it into one Db2 subsystem on the Accelerator, or running a query against each Db2 system and then joining the results programmatically. The Db2 Analytics Accelerator allowed multiple DB2 subsystems to connect to a single appliance, but each subsystem was isolated from the others, so there was still no ability to join data from multiple systems.
Until now . . .
The Db2 Analytics Accelerator now has the capability to join data from multiple Db2 subsystems that are connected to the same appliance. Using federation, you can quickly and easily join data from multiple subsystems to gain a federated view of the data. This join happens within the appliance itself, so no additional CPU is used in IBM Z. New Db2 Catalog tables are used to define the federation and set up security to control user access to the data. New stored procedures are used to setup the federation and security.
- New Pseudo catalog table SYSACCEL.SYSACCELERATEDTABLESAUTH
- New column REMOTELOCATION in SYSACCL.SYSACCELERATEDTABLES
Let’s take a look at how federation on the Accelerator is enabled and used. We will use a simple example with two Db2 subsystems, but the only limit on the number of Db2 subsystems that can participate in federation is the physical resources available on the appliance.
DB2O will be the Db2 subsystem that owns data to be federated. DB2R will be the referencing Db2 system that wants to join its data with data from DB2O. Let’s assume that there are two tables on each system, both having the same name although this isn’t required. The two tables will be CUSTOMER and LINEITEM within schema TPCD.
The first thing to do is to setup the federation and grant privileges to user(s) to read the data.
Step 1. GRANT DB2R read privilege on the two tables in DB20. This is accomplished by invoking stored procedure SYSPROC.ACCEL_GRANT_TABLES_REFERENCE. This grant information is contained in a SYSACCEL.SYSACCELERATEDTABLESAUTH.
Step 2. Create reference tables on DB2R to allow read access for the two tables in DB2O. This is accomplished by invoking SYSPROC.ACCEL_CREATE_REFERENCE_TABLES. This stored procedure will create Accelerator Only Tables (AOT) to reference the tables on DB2O. These are shell AOT’s to be used in the SQL and will not actually contain any data from DB2O. The data will remain in DB2O. For our example, we will use TPCDDB2O as the schema name to point to the tables on DB2O.
That’s it! You now have federation setup between DB2O and DB2R. Now let’s look at how we access the federated data. Although it is not required, to ease the SQL a user or application developer would need to code, a DB2 view can be used. For our example, we will create a view on each set of tables.
CREATE VIEW CUSTOMER_ALL AS
(SELECT * FROM TPCDDB2O.CUSTOMER
SELECT * FROM TPCD.CUSTOMER);
CREATE VIEW LINEITEM_ALL AS
(SELECT * FROM TPCDDB2O.LINEITEM
SELECT * FROM TPCD.LINEITEM);
And now the query we would execute from subsystem DB2R can be as simple as:
SELECT * FROM CUSTOMER_ALL WHERE …
SELECT * FROM LINEITEM_ALL WHERE …
But, how does it perform?
Remember that the federation happens within the appliance using the power of the Accelerator so no additional CPU is used on the mainframe. During the beta of federation, one customer experienced amazing results. Nine Db2 subsystems were connected to the same appliance. Each subsystem had 150 tables. Running a separate query on each subsystem totaled 10 seconds of elapsed time, and this does not include the overhead that would then be required to join nine result sets together to actually have a federated result set. Federation was then setup for the nine DB2 subsystems and a single query was run that returned a federated result set from all nine DB2 subsystems. The query ran in 1.1 seconds.
As you can see, if you have a need to federate data across multiple DB2 subsystems, you can utilize the power of the Db2 Analytics Accelerator to run the queries faster with no additional mainframe overhead and without moving the data off IBM Z.
For additional details, please review the IBM Db2 Analytics Accelerator Version 5.1 PTF5 Release Notes.
About the author
James Knisley is an IBM Client Technical Specialist. He has been working with DB2 and IMS for over 25 years. He began his Db2 career with DB2 Version 1.3. Jim has worked as a DBA, System Programmer, and Manager with companies in manufacturing, insurance, banking, and defense. Currently, he provides Db2 technical expertise to IBM customers in the Midwest.
Disclaimer: The comments in this blog are based upon the author’s current knowledge and personal experiences. All comments are the author’s personal view and do not necessarily reflect the positions or opinions of IBM or its affiliates. You should conduct independent tests to verify the validity of any statements made in this blog before basing any decisions upon those statements.