Skip to main content

Maximize the performance of WebSphere Information Integrator with Materialized Query Tables

Anjali Betawadkar-Norwood (anorwood@us.ibm.com), Advisory Software Engineer, IBM Silicon Valley Lab
Author photo
Anjali Betawadkar-Norwood is an Advisory Software Engineer in Silicon Valley Laboratory in San Jose, California. Her expertise is Query Optimization, especially as it applies to federated systems. She has been working in the area of query optimization for five years. Currently, she leads a small team focusing on federated query optimization in the WebSphere Information Integrator Federated Query Compiler team.
Valer-Alin Crisan (vcrisan@us.ibm.com), Staff Software Engineer, IBM
author photo
Valer-Alin Crisan is a Software Engineer in Silicon Valley Laboratory in San Jose, California. His expertise is query optimization for federated systems. He has been working in the area of query optimization for about a year and a half in a small team focusing on federated query optimization in the WebSphere Information Integrator Federated Query Compiler team.
Eileen Lin (etlin@us.ibm.com), Senior Technical Staff Member, IBM Silicon Valley Lab
Author photo
Dr. Eileen Lin is a Senior Technical Staff Member in Silicon Valley Laboratory in San Jose, California. She is one of the original members responsible for the success of DataJoiner, a federated database product that is the predecessor of the federation technology in DB2. Currently, she is the lead architect for WebSphere Federation Server. Dr. Lin has many patents covering areas such as federation technology, query optimization and parallel query processing.

Summary:  You might have heard that materialized query tables (MQTs) can significantly improve the performance of queries. This article shows you why MQTs can also help the performance of a federated system. You will find everything you want to know about using MQTs in WebSphere® Information Integrator in this article.

Date:  23 May 2006
Level:  Introductory
Activity:  1298 views
Comments:  

Introduction

A materialized query table (MQT) is a table that materializes the pre-computed result of a query involving one or more tables or nicknames. After the MQT is created and populated, an arbitrary subsequent query may be satisfied by the MQT if the MQT matches all or a part of the query. Please see [1] for more details. Caching data using MQTs and allowing the query optimizer to make a cost-based decision to use the MQTs to satisfy queries is known to improve query performance in a Database Management System (DBMS). Use of MQTs in a federated system such as WebSphere Information Integrator can be even more beneficial for the following reasons:

  • A federated query typically executes part of (or the whole) query on a remote system and brings the intermediate results back to the federated system over the network. Running a query in federated mode is usually slower compared to the same query that accesses only local data because of the network latency. An MQT involving nicknames makes remote data available locally, thus saving trip(s) over the network to the remote data source.
  • If one of the remote DBMSs from which data is needed in a query is unavailable, the query may still be able to generate the result set if the access plan for the query decides to satisfy the query using an MQT instead of the remote data.
The next release of WebSphere Information Integrator, due later in 2006, will be called WebSphere Federation Server. Stay tuned to developerWorks to learn more about the features and new capabilities coming in this next version of the product.

This article introduces MQTs in WebSphere Information Integrator. It shows how to define MQTs suitable for a workload and how to populate them. It presents performance results of experiments with MQTs in WebSphere Information Integrator. It gives hints and tips to determine whether an MQT is being used by the query and, if not, what prevented the use of that MQT. It also enumerates the restrictions on the use of MQTs.

Overview

The article starts out with a motivating example to show how MQTs can help query performance in the Understanding MQTs section. The article then briefly explains how a decision is made to use an MQT in a query and the benefits of using MQTs in a federated environment. The section Steps to create and use MQTs discusses how to choose MQTs that are likely to improve the performance of your workload. The section walks you through all the steps of creating an MQT, populating it with data, and enabling it for optimization. The section on troubleshooting helps you determine why an MQT was not used to satisfy a query. It also lists the restrictions on the use of MQTs. The last section in the artcile describes our in-house performance experiments and the performance benefit we obtained with the use of MQTs.

Understanding MQTs

This section introduces an example business scenario and works its way through the need to use an MQT and the steps to create and enable its use.

A motivating example

Consider a company with the head office in New York and a data warehouse in San Jose. The data warehouse keeps track of the sales of all the products sold in the U.S. by that company. The sales data is maintained in a table 'sales' with the following schema.

CityStateProduct_nameQuantityPriceTransaction_date

A row in the sales table is the aggregate quantity and total price for all sales of a particular product for a given day, city, and state. The head office in New York employs a WebSphere Information Integrator system. The WebSphere Information Integrator system has a nickname 'divisional_sales' corresponding to the 'sales' table in the San Jose database.

Consider that the sales executives at the head office need to look up sales of products in the state of California in the year 2005. They need the information aggregated by city.

The query to compute aggregate sales by city can be written as:


SELECT CITY, SUM(PRICE) AS SUM_SALES
FROM DIVISIONAL_SALES
WHERE STATE ='CA' AND TRANSACTION_DATE BETWEEN '01/01/2005' AND '12/31/2005'
GROUP BY CITY;
      

The executives may run this type of query frequently, sometimes adding more filtering criteria on the column City or SUM_SALES. One variation of the above query the executives are interested in finding is the total sales of products in the city of San Francisco.

This requirement can be expressed by adding a predicate CITY = 'SAN FRANCISCO' to the query above. The query looks like:


SELECT CITY, SUM(PRICE) AS SUM_SALES
FROM DIVISIONAL_SALES
WHERE STATE ='CA' AND CITY='SAN FRANCISCO' 
AND TRANSACTION_DATE BETWEEN '01/01/2005' AND '12/31/2005';
      

Another variation could be when the executives want to find the cities that have total sales less than $100,000. This information can be expressed as the query above with an added predicate SUM_SALES < 100000 as follows:


SELECT CITY, SUM(PRICE) AS SUM_SALES
FROM DIVISIONAL_SALES
WHERE STATE ='CA' AND TRANSACTION_DATE BETWEEN '01/01/2005' AND '12/31/2005'
GROUP BY CITY 
HAVING SUM(PRICE) < 100000;

The sales information gets updated constantly in the sales table, but it is not essential that the executives get up-to-the-second information. The business context of the queries can tolerate information that has been stale for a few hours or even a few days. In this example, deriving the aggregate sales information is a computationally complex task. Moreover, the computation is repeated multiple times with slightly differing criteria. An important characteristic to note is that somewhat stale data is acceptable. An MQT can be defined such that part of the computation can be performed once and the results stored such that subsequent queries can be answered from the cached computation with minimal extra processing.

One possible MQT definition that will satisfy the three variations of the query above looks like:


CREATE TABLE AGGREGATE_SALES (CITY, TOTAL_SALES) AS 
(SELECT CITY, SUM(PRICE)
FROM DIVISIONAL_SALES
WHERE  STATE ='CA' AND TRANSACTION_DATE BETWEEN '01/01/2005' AND '12/31/2005'
GROUP BY CITY)
DATA INITIALLY DEFERRED REFRESH DEFERRED 
ENABLE QUERY OPTIMIZATION
MAINTAINED BY SYSTEM;
      

The clauses in the definition of MQT dictate the behavior of the MQT. These clauses are discussed in detail in later sections. Once the MQT is created and populated, when a user submits a query to find aggregate sales or its variant information, it may be satisfied by using the aggregate_sales MQT.

How is the decision made to use an MQT to answer a query?

A user does not need to change the query in order to make use of the MQT. Once the MQT is created and populated with data, the WebSphere Information Integrator optimizer determines whether the user query can be satisfied by the use of MQT(s) and whether using the MQT(s) will result in a performance benefit. The following criteria need to be satisfied in order for the optimizer to make use of an MQT to replace a part of the query or the whole query:

  • The MQT applies the same set or a subset of predicates that the user query under consideration applies. If the MQT contains only a subset of the predicates applied by the user query, then it can still be used because the optimizer compensates by adding the missing predicates as filters to data returned from the MQT.
  • The MQT selects the exact same set or a super set of columns that the user query under consideration selects. If the query selects fewer columns than the corresponding matching MQT, the optimizer can eliminate unneeded columns.
  • Other criteria regarding freshness of data in MQT and optimization level used for running the queries are satisfied. These criteria are described in the section 'How to use and create MQTs'.

Benefits of using MQTs in the federated environment

MQTs are beneficial for various reasons. Some of them are described below:

  • The performance of queries that are computationally expensive can be improved. Caching the results of expensive computations once and using the cached results multiple times provides a performance benefit. It also uses less CPU and prevents overloading of the system.
  • Computations common to multiple queries can be represented using an MQT. The MQT is populated with the data once and is reused by multiple queries, providing performance benefit.
  • MQTs cache the data locally in a federated system and avoid expensive trip(s) to back-end data sources to transfer data.
  • If a back-end data source that provides data to a query is down and the optimizer chooses to satisfy the query using MQTs, the query will be executed. Thus unavailability of data source will not obstruct execution of queries.
  • MQTs provide important statistical information to the optimizer. More specifically, the presence of MQTs that have recent statistics collected on them may help the optimizer determine the degree of filtering of a predicate, for instance, calculate the selectivities of predicates in a more accurate manner.

Steps to create and use MQTs

Step 1: Use the Design Advisor to determine a set of MQTs that will help your query workload

The kind of MQTs that will be helpful in improving the performance of your workload can be determined by analysis of your queries or by use of a tool such as Design Advisor. Determining MQTs likely to be beneficial for your workload is discussed in this section.

The DB2® Design Advisor assists users with the creation of materialized query tables (MQTs) and indexes, the repartitioning of tables, the conversion to multidimensional clustering (MDC) tables, and the deletion of unused objects. The recommendations are based on one or more SQL statements provided by the user. A group of related SQL statements is known as a workload. Users can rank the importance of each statement in a workload and specify the frequency at which each statement in the workload is to be executed. The Design Advisor creates an output of an SQL script that includes CREATE INDEX, CREATE SUMMARY TABLE (MQT), and CREATE TABLE statements to create the recommended objects.

In this article we focus on how to use the Design Advisor to recommend MQTs for federated queries. The Design Advisor is invoked through the DB2 Control Center or the "db2advis" command.

Here is an example of how to use the Design Advisor to recommend MQTs for a federated workload. In this example,

  • the queries comprising the workload are read from a text file 'my_queries.sql',
  • the workload is applied on the objects defined in the database 'mydb', and
  • the MQTs DDL statements are saved in the file 'my_rec_mqts.sql'.
  • The advisor will recommend REFRESH DEFFERED MQTs.

db2advis -d mydb -i my_queries.sql -m M -o my_rec_mqts.sql -u -k OFF

The output file generated by the Design Advisor contains a set of DDL statements that create MQTs, refresh and update statistics on the MQTs, and create indexes on them. The output file may be used as it was generated or it may be customized for the specific needs of your application. The Design Advisor generalizes the recommended MQTs for the given queries by removing the filter (local) predicates from the MQTs. For example, if the Design Advisor is invoked on a query that selects data from three tables and contains three join predicates and two filter predicates, the recommended MQT won't contain any of the filter predicates. You may choose to include some of the filter predicates in MQTs if you think that your queries would always filter data using the same constant value.

Step 2: Create an MQT

Once you have determined the MQT definitions, the MQTs can be created with the use of a "CREATE TABLE" statement. We will work with the earlier example from 'Understanding MQTs' section. Our example MQT looks like:


CREATE TABLE AGGREGATE_SALES (CITY, TOTAL_SALES) AS 
(SELECT CITY, SUM(PRICE)
FROM DIVISIONAL_SALES
WHERE STATE ='CA' AND TRANSACTION_DATE BETWEEN '01/01/2005' AND '12/31/2005'
GROUP BY CITY)
DATA INITIALLY DEFERRED REFRESH DEFERRED 
ENABLE QUERY OPTIMIZATION
MAINTAINED BY SYSTEM;
      

One needs to carefully determine the clauses to use while creating an MQT.

The 'DATA INITIALLY DEFERRED' clause indicates that when the MQT is created, it is not populated with data as a part of create table statement.

The 'REFRESH DEFERRED' clause indicates that use of a 'REFRESH' statement will populate the MQT with data. Any changes in the form of insert, update or delete of the data to the table(s) or nicknames used in the MQT definition will not automatically result in an insert, update or delete of data in the MQT. An alternative to 'REFRESH DEFERRED' is the 'REFRESH IMMEDIATE' option. This option specifies that any changes to data in the form of insert, update or delete to the tables used in the MQT definition will automatically result in changes to the data in MQT. This option is not available for MQTs whose definition references nicknames. In this article, we will work with MQTs over nicknames, hence these MQTs will always be defined as 'REFRESH DEFERRED' MQTs.

An MQT may be made available to the optimizer for use by setting the 'ENABLE QUERY OPTIMIZATION' clause. The 'DISABLE QUERY OPTIMIZATION' clause may be used to prevent the optimizer from considering use of a particular MQT.

The 'REPLICATED' clause indicates that a copy of the MQT is made on all the partitions of a Data Partitioning Feature (DPF) enabled system. In such a system, it is possible to specify the partitioning of an MQT or to define it as a REPLICATED MQT. If an MQT is defined as REPLICATED, it is possible to get better performance for an operation (e.g. join) that can use this MQT and correlates it with data on the same node through collocation.

MQTs can be classified according to the mechanism that is used to populate them with data as follows:

  • System-Maintained MQTs: Data in a system-maintained MQT is managed by the WebSphere Information Integrator system. The 'REFRESH TABLE' statement can be used to populate the MQT with data. Insert, update, or delete of data into the MQT directly by the user is disallowed.
  • User-Maintained MQTs: A user-maintained MQT, as the name suggests, is maintained by the user. The 'REFRESH TABLE' statement is not applicable to user-maintained MQTs. Users may insert, update, or delete data in a user-maintained MQT. Users may also make use of the 'LOAD' utility to populate the MQT with data. It is the user's responsibility to ensure that a user-maintained MQT contains data that is consistent with the definition of the MQT.
  • Federated_Tool- Maintained MQTs or Cache Tables: An MQT defined as 'MAINTAINED BY FEDERATED_TOOL' is maintained by automatic replication. These MQTs do not support the use of the 'REFRESH TABLE' statement. Federated_tool maintained MQTs are also known as Cache Tables. They are a special sub-type of user-maintained MQTs, the data in which is populated by the use of Replication. "Cache tables" can only be set up using the Control Center. There are certain restrictions that apply only to Cache tables that do not apply to system-maintained and user-maintained MQTs. These restrictions are discussed in the Restrictions and Troubleshooting sections below.

Step 3: Use SET INTEGRITY statement to bring an MQT out of check-pending state.

This step applies to user-maintained MQTs only and it enables population of data in the MQTs. In our example, 'aggregate_sales' was defined as a user-maintained MQT, the following statement must be issued to bring the MQT out of check-pending state.


SET INTEGRITY FOR AGGREGATE_SALES ALL IMMEDIATE UNCHECKED;
      

The "SET INTEGRITY" statement specifies that the MQT is to have integrity checking turned on without being checked for integrity violations.

This step is not required for system-maintained MQTs since the REFRESH statement used to populate the data brings a system-maintained MQT out of check-pending state.

Step 4: Populate the MQT with data

Once the MQT is created, it needs to be populated with data to reflect the results of the query that the MQT is supposed to cache. This step also needs to be executed whenever the contents of the MQT are to be updated based on the underlying data.

  1. Populating a system-maintained MQT with data:

    If the MQT is a system-maintained MQT, it can be populated with data using a 'REFRESH TABLE' statement. The "REFRESH TABLE statement can be invoked for our example MQT as:


    REFRESH TABLE AGGREGATE_SALES;
          

    When a REFRESH statement is invoked for an MQT defined with nicknames in it, it deletes any pre-existing data in the MQT and populates the data afresh. Incremental refresh is not available. Depending on the complexity of the query and the size of the result set, the refresh operation may take a long time. You can set up a task in the task center such that the refresh operation of the MQT will happen at a time of your choice.

  2. Populating a user-maintained MQT with data:

    If the MQT is user-maintained, it can be populated using an INSERT statement or a LOAD statement. If our example MQT was a user-maintained MQT, it could have been populated using an INSERT statement as:


    INSERT INTO AGGREGATE_SALES (SELECT CITY, SUM(PRICE)
    FROM DIVISIONAL_SALES
    WHERE STATE ='CA' AND TRANSACTION_DATE BETWEEN '01/01/2005' AND '12/31/2005'
    GROUP BY CITY);
          

    A user-maintained MQT can also be populated using a LOAD statement. The steps to accomplish this are listed below:

    Use a 'DECLARE CURSOR' statement to define a cursor over the evaluation of the query. The results of the query will be inserted into the MQT in the subsequent LOAD statement.


    DECLARE CUR1 CURSOR FOR 
    SELECT CITY, SUM(PRICE)
    FROM DIVISIONAL_SALES
    WHERE STATE ='CA' AND TRANSACTION_DATE BETWEEN '01/01/2005' AND '12/31/2005'
    GROUP BY CITY;
          

    The data is loaded into the MQT using the "Load command".


    LOAD FROM CUR1 OF CURSOR INSERT INTO AGGREGATE_SALES;
          

    The 'INSERT' option of the LOAD command inserts the data into the AGGREGATE_SALES table. This may result in duplicate values. The 'REPLACE' option of LOAD may be used to first drop the existing data in AGGREGATE_SALES table and insert the new data.

    Our experimentation suggests that when relatively large amounts of data need to be loaded into the MQT, LOAD performs faster than INSERT. You may want to experiment with both the options to determine whether LOAD or INSERT would give better performance in your scenario.

    You can consider setting up a task in the task center such that update of the data in a user-maintained MQT will happen at a time of your choice.

  3. Populating a federated_tool-maintained MQT with data

    A replication schedule is automatically set up between the MQT and the nickname on which this MQT is based. Replication takes care of keeping the data in the MQT current.

Step 5: Create indexes over MQTs

Once the MQTs are created, you may want to check if any indexes should be created on the MQTs. This can be performed with the "CREATE INDEX" statements. Since MQTs are local tables that cache the results of a query, the same considerations for creating an index on a local table apply to creating indexes on MQTs. If the MQT happens to have a key, unique indexes should be created to reflect the fact. Creation of indexes, unique or otherwise, helps read the data from MQTs in an efficient manner, resulting in better performance.

Note that unique indexes cannot be created on system-maintained MQTs. For our example if 'City' is a key for the MQT aggregate_sales, and aggregate_sales is a user-maintained MQT, a unique index may be created on the MQT as follows:


CREATE UNIQUE INDEX uniq_city_index ON aggregate_sales (City);

Step 6: Define informational constraints on nicknames

Defining "informational constraints" on nicknames to reflect the constraints on the underlying tables helps optimization even in the absence of MQTs. When MQTs are present, informational constraints can sometimes help the optimizer match a query to the MQT when the MQT would not have been considered a match in the absence of informational constraints. We demonstrate this with the help of an example:

Assume that the user has a nickname 'OFFICE_INFO' in addition to the DIVISIONAL_SALES nickname described before. The 'OFFICE_INFO' nickname has columns, CITY, OFFICE_ADDR and PHONE_NUM.

The column 'CITY' in the OFFICE_INFO nickname is unique and hence a unique constraint has been defined on the nickname OFFICE_INFO on column CITY as follows:


ALTER NICKNAME OFFICE_INFO ADD CONSTRAINT UNIQ_CONS UNIQUE (CITY) NOT ENFORCED 
ENABLE QUERY OPTIMIZATION;
      

The column CITY in nickname DIVISIONAL_SALES always finds a matching city in nickname OFFICE_INFO and hence an informational referential integrity constraint has been defined on column CITY between nicknames OFFICE_INFO and DIVISIONAL_SALES as follows:


ALTER NICKNAME DIVISIONAL_SALES ADD CONSTRAINT CITY_CONSTRAINT
FOREIGN KEY (CITY) REFERENCES OFFICE_INFO(CITY) NOT ENFORCED
ENABLE QUERY OPTIMIZATION;
      

Also assume that the MQT aggregate_sales exists as described before.

Now, the user issues a query:


SELECT DS.CITY, SUM(PRICE) 
FROM DIVISIONAL_SALES DS, OFFICE_INFO OI
WHERE STATE ='CA' AND TRANSACTION_DATE BETWEEN '01/01/2005' AND '12/31/2005'
AND DS.CITY = OI.CITY
GROUP BY DS.CITY;
		

Since the referential integrity constraint ensures that every value of CITY in DIVISIONAL SALES will find a match in OFFICE_INFO, the optimizer can safely eliminate the join to the OFFICE_INFO nickname and route to the MQT AGGREGATE_SALES.

Step 7: Collect statistics over MQTs using runstats

Once the MQTs and relevant indexes are created, it is advisable to run "runstats" on MQTs in order to collect statistics. The WebSphere Information Integrator optimizer is cost-based and relies on accurate statistics for all objects, including MQTs, to make execution plan decisions. For our example, MQT, runstats can be run to collect statistics as follows:


RUNSTATS ON TABLE sample.aggregate_sales WITH DISTRIBUTION AND DETAILED INDEXES ALL;

Here 'sample' is the schema under which the MQT aggregate_sales has been created.

Step 8: Enable the MQT to be considered for optimization

There are two ways to enable an MQT to be considered for optimization. An MQT is also known as a maintained table.

  1. Use 'CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION' Special Register. The special register can be set to a desired value to enable certain maintained table objects for a session. The "SET CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION statement" identifies the types of MQTs that can be considered when optimizing the processing of dynamic SQL queries. The possible values are:
    • SYSTEM: only the system-maintained MQTs are considered for optimization. This is the default value.
    • ALL: all maintained table types will be considered for optimization.
    • NONE: none of the maintained table types will be considered for optimization.
    • FEDERATED_TOOL: only the maintained tables that are defined as 'MAINTAINED BY FEDERATED_TOOL' will be considered for optimization.
    • USER: only the maintained tables that are defined as 'MAINTAINED BY USER' will be considered for optimization.

    For example, to indicate to the optimizer that all maintained table types should be considered for optimization, you would set the special register as follows:


    SET CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION ALL;
          

  2. Use database configuration parameter 'DFT_MTTB_TYPES'. This "dft_mttb_types - Default maintained table types for optimization configuration parameter" specifies the default value for the CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION special register. The value of the register in turn determines what types of refresh deferred materialized query tables will be used during query optimization. Valid values for this database configuration parameter are, ALL, NONE, FEDERATED_TOOL, SYSTEM and USER with the semantics as described above for the values of the special register. The default value of this database configuration parameter is 'SYSTEM'.

    For example, to indicate that only the maintained tables that are defined as 'MAINTAINED BY USER' should be considered for optimization, you would set the database configuration parameter for database SAMPLE as:

    DB2 UPDATE DATABASE CONFIGURATION FOR SAMPLE USING DFT_MTTB_TYPES USER;
          

Step 9: Set the REFRESH AGE special register

REFRESH AGE specifies whether stale data in MQTs is tolerated. The optimizer will consider using MQTs to process a query only if REFRESH AGE is set to a non-zero value and the optimization level is either set to 2 or 5 or higher. REFRESH AGE applies only to 'REFRESH DEFERRED' type of MQTs. Currently two values of REFRESH AGE are supported.

  • 0: this indicates that 'REFRESH DEFERRED' MQTs will not be considered by optimizer to process a query.
  • Keyword 'ANY': This setting indicates that MQTs will be considered by the optimizer to process queries.

REFRESH AGE may be set in one of two ways:

  1. "Setting the CURRENT REFRESH AGE special register"

    Setting the CURRENT REFRESH AGE special register to the value ANY enables all refresh deferred MQTs to be considered for optimization in the context of the current connection.

  2. "Setting the DFT_REFRESH_AGE database configuration parameter"

    The default value of REFRESH_AGE can be set as a database configuration parameter. For example,


    DB2 UPDATE DATABASE CONFIGURATION FOR SAMPLE USING DFT_REFRESH_AGE 0; 
    		

    This setting prevents refresh-deferred MQTs from being considered for optimization. The special register, if set, takes precedence over the database configuration parameter. If the special register is not set to any value, the value of the database parameter determines the REFRESH AGE.

Both the parameters start out with a default value of 0, for instance, by default 'REFRESH DEFERRED' MQTs are not considered for optimization.

Step 10: Find out whether MQT is being used to answer a query

You can use the output of Visual Explain, Dynamic Explain, or the db2exfmt tool to find out whether an MQT was used to satisfy a query. In this section, we will show a snippet of db2exfmt output to explain how to determine use of MQT.

When an access plan uses an MQT, the db2exfmt output shows an MQT object being accessed. In addition, for some queries, the optimized SQL (this is the output of query rewrite) in the db2exfmt output shows use of an MQT instead of base nicknames. The optimized SQL can be seen right after the original SQL statement in the db2exfmt output.

For the query:


SELECT CITY, SUM_SALES
FROM 
(SELECT CITY, SUM(PRICE) AS SUM_SALES
FROM DIVISIONAL_SALES
WHERE STATE ='CA' AND TRANSACTION_DATE BETWEEN '01/01/2005' AND '12/31/2005'
GROUP BY CITY) AS TOTAL_SALES
WHERE SUM_SALES < 100000;

Note that the query has been rewritten to access the aggregate_sales MQT.


SELECT Q1.CITY AS "CITY", Q1.TOTAL_SALES AS "SUM_SALES" 
FROM SAMPLE.AGGREGATE_SALES AS Q1 
WHERE (Q1.TOTAL_SALES < 100000);

The plan fragment looks as follows:


      Rows 
     RETURN 
     (   1) 
      Cost 
       I/O 
       |
     20.6667 
     TBSCAN 
     (   2) 
     20.1968 
        1 
       |
       62 
 TABLE: SAMPLE   
 AGGREGATE_SALES

The plan shows a table scan over the aggregate_sales MQT. Moreover, the details of the operator table scan shows the following details:


	             Input Streams:
		-------------------
			1) From Object SAMPLE.AGGREGATE_SALES

				Estimated number of rows: 	62
				Number of columns: 		3
				Subquery predicate ID: 		Not Applicable

				Column Names:
				------------
				+Q1.$RID$+Q1.TOTAL_SALES+Q1.CITY
		

The list of objects used in the access plan is shown at the end of the db2exfmt output. For our example, the following information is shown. Note that the nickname divisional_sales is shown as one of the objects used in the access plan. But the words '(reference only)' suggest that the nickname was referenced in the query, but was not accessed to satisfy the query.


Objects Used in Access Plan:
---------------------------------------

	Schema: SAMPLE  
	Name: 	DIVISIONAL_SALES
	Type: 	Nickname (reference only)

	Schema: SAMPLE  
	Name: 	AGGREGATE_SALES
	Type: 	Table
			Time of creation: 		2006-02-17-17.16.21.751126
			Last statistics update:
			Number of columns: 		2
			Number of rows: 		62
			Width of rows: 			44
			Number of buffer pool pages: 	1
			Distinct row values: 		No
			Tablespace name: 		USERSPACE1        
			Tablespace overhead: 		12.670000
			Tablespace transfer rate: 	0.180000
			Source for statistics: 		Single Node
			Prefetch page count: 		32
			Container extent page count: 	32
			Table overflow record count: 	0
			Table Active Blocks: 		-1
		

Restrictions on MQTs over nicknames

There are a few restrictions on MQTs over nicknames in order for the federated queries to use the MQTs. These restrictions can be divided in three categories:

  • Restrictions on MQTs over nicknames in a partitioned database environment
  • Restrictions on MQTs over non-relational nicknames
  • Restrictions on MQTs over nicknames advised by the Design Advisor

Restrictions on MQTs over nicknames in a partitioned database environment

WebSphere Information Integrator does not support system-maintained materialized query tables that reference nicknames in a partitioned database environment. Please follow the "Restrictions on MQTs over nicknames in a partitioned database environment" for tips on how to work around this restriction.

Restrictions on MQTs over non-relational nicknames

There is a set of restrictions for MQTs over non-relational nicknames specific to each non-relational data source. These restrictions are introduced by the required predicates and implicit relationships between nicknames of specific data sources. For example, nicknames for Blast and HMMER data sources use some predicates to bind-in values to columns that function as input parameter passed to Blast and HMMER tools. Therefore, the MQTs defined on such nicknames must specify these predicates; otherwise, the refresh on the MQT would fail or the queries that use MQTs would produce incorrect results. Another example is the restriction for XML nicknames that a MQT cannot be created for a child XML nickname. The MQT must reference the whole hierarchy of nicknames from the root nickname to that child nickname.

You can find more information on the "Restrictions on MQTs over non-relational nicknames" page.

Restrictions on MQTs recommended by the Design Advisor

There are a few restrictions and considerations for MQTs generated by the Design Advisor, as shown below.

The Design Advisor does not recommend user-maintained MQTs. Since system-maintained MQTs over nicknames are not supported in the partitioned database configuration, you cannot use the Design Advisor in that environment.

The option -r has no effect when the Design Advisor is used to recommend MQTs for federated queries. The option -r instructs the Design Advisor to use extended statistics for the considered MQTs. Extended statistics are collected on objects referenced in the MQT definition using sampling. Since sampling is not supported on nicknames, the Design Advisor uses the optimizer's estimates for statistics.

Indexes recommended for MQTs are designed to improve workload performance and not the MQT refresh performance.

If updates, inserts, or deletes are not included in the specified workload, the performance impact of updating a recommended REFRESH IMMEDIATE MQT is not considered.


Troubleshooting

Hints and tips on why a query is not using an MQT

If your query is not using an MQT and you believe it should be, check the following:

  1. Make sure the CURRENT REFRESH AGE special register is set to ANY. This special register is initialized with the value of the DFT_REFRESH_AGE database configuration parameter when the connection to the database is initiated. It can also be set explicitly using the command SET CURRENT REFRESH AGE ANY. Cache tables (MQTs created by Control Center tooling) are considered for optimization even when the refresh age is 0.
  2. Make sure the MQT is enabled for optimization. An MQT is considered during optimization if the option ENABLE QUERY OPTIMIZATION was specified during CREATE TABLE or ALTER TABLE statements for the MQT.
  3. In the case of refresh-deferred MQTs, make sure that the special register CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION is set to a value that includes the category of your MQT. For example, if the considered MQT is a refresh-deferred system-maintained MQT, this special register must be set to ALL or SYSTEM. This special register is initialized with the value of the DFT_MTTB_TYPES database configuration parameter.
  4. Make sure that the tables referenced in the MQT are a subset of tables referenced in the query. In some cases, WebSphere Information Integrator might choose an MQT that selects from more tables than the query. For example, this might happen when two ore more of the tables in the MQT have a referential integrity relationship defined and the join predicate is on their primary key - foreign key columns.
  5. Make sure that the predicates applied by the MQT are a subset of the predicates applied by the query.
  6. Make sure that the query is a dynamically compiled statement. MQTs are not taken in consideration for static-compiled queries.
  7. As a good practice, make sure that runstats has been run on the MQTs and their indexes to update statistics since WebSphere Information Integrator uses a cost-based optimizer to generate access plans for the queries.
  8. If you have created many MQTs, all of which can be used to satisfy a query, the compilation time of the query can increase. The existence of many candidate MQTs might prevent the compiler from choosing the best MQT for that query.
  9. If the query is executed at a higher isolation level than the isolation level at which the MQT is created, that MQT will not be considered for execution of the query.

Some things to remember while using MQTs:

  1. If a query has a function template in a predicate or a select list, the function template must be part of the materialized query table. Not including the function template in the MQT may lead to a "statement not supported" error when the query is run and the MQT is chosen to satisfy the query.
  2. When a user-maintained MQT is populated or updated using the INSERT statement, make sure that the same MQT is not used as the source of the insert operation by checking the explain output. Setting the REFRESH AGE to 0 or changing the special register 'SET CURRENT MAINTAINED TABLE TYPE FOR OPTIMIZATION' to a value that does not include user-maintained MQTs will ensure that the MQT itself will not be used as a source for the INSERT statement.
  3. Cache tables are similar to MQTs with the restriction that a cache table supports only non-aggregate queries over a single table. User-maintained MQTs could be used in place of cache tables when replication between WebSphere Information Integrator and the remote data source is not supported.

Performance results of our experiments

To test the performance of the MQTs over nicknames we used parts of the "TPC-H" benchmark workload adapted to fit a federated environment.

We generated 4GB TPC-H data and divided the TPC-H tables among two databases on two separate physical machines. One database contained the tables PART, SUPPLIER, PARTSUPP, NATION, REGION and the second database contained the tables LINEITEM, ORDERS, CUSTOMERS. Both databases were running DB2 Universal Database™ Version 8.2 Fixpack 11 running on AIX® V5.2 on 4-CPU boxes. On a third AIX V5.2, 4-CPU box, we installed WebSphere Information Integrator Version 8.2 Fixpack 11 and created a federated database. In the federated database we defined servers to access the two DB2 databases and nicknames to map the TPC-H tables from those DB2 databases.


Figure 1. The configuration of our experiments
The configuration of our experiments

The next step was to plug in constant values for the parameter markers in the 22 TPC-H queries to simulate a typical workload and to measure the execution time of those queries on the federated server.

Then we ran the Design Advisor on those 22 queries to recommend MQTs. The Design Advisor was run using the command:


db2advis -d tpcdfed -i tpch_queries.sql -m M -o tpch_mqts.sql -u -b tpch_tblsp1 -k OFF.

We customized the recommended MQTs by adding some local predicates from the TPC-H queries that could be routed to those MQTs.

For example, the Design Advisor recommended the following MQT for the query #5:


CREATE SUMMARY TABLE "TEST1 "."MQT2" AS (
SELECT Q8.C0 AS "C0", Q8.C1 AS "C1", Q8.C3 AS "C2", Q8.C2 AS "C3"
FROM TABLE( SELECT Q7.C0 AS "C0", SUM(Q7.C1) AS "C1", Q7.C3 AS "C2", Q7.C2 AS "C3"
          FROM TABLE( SELECT Q2.N_NAME AS "C0",
                (Q4.L_EXTENDEDPRICE *(+1.00000000000000E+000 - Q4.L_DISCOUNT)) AS "C1",
                               Q5.O_ORDERDATE AS "C2", Q1.R_NAME AS "C3"
                        FROM TPCD1.REGION AS Q1,TPCD1.NATION AS Q2, 
                             TPCD1.SUPPLIER AS Q3, TPCD2.LINEITEM AS Q4,
                             TPCD2.ORDERS AS Q5, TPCD2.CUSTOMER AS Q6
            WHERE (Q2.N_REGIONKEY = Q1.R_REGIONKEY) AND (Q3.S_NATIONKEY = Q2.N_NATIONKEY)
          AND (Q6.C_NATIONKEY = Q3.S_NATIONKEY) AND (Q4.L_SUPPKEY = Q3.S_SUPPKEY) 
          AND (Q5.O_ORDERKEY = Q4.L_ORDERKEY) AND (Q6.C_CUSTKEY = Q5.O_CUSTKEY)) AS Q7
           GROUP BY Q7.C2, Q7.C3, Q7.C0) AS Q8) 
DATA INITIALLY DEFERRED REFRESH DEFERRED;

We customized this MQT by adding the predicate R_NAME = 'ASIA' so the new MQT definition becomes:


CREATE SUMMARY TABLE "TEST1 "."MQT2" AS (
SELECT Q8.C0 AS "C0", Q8.C1 AS "C1", Q8.C3 AS "C2", Q8.C2 AS "C3"
FROM TABLE( SELECT Q7.C0 AS "C0", SUM(Q7.C1) AS "C1", Q7.C3 AS "C2", Q7.C2 AS "C3"
       		FROM TABLE( SELECT Q2.N_NAME AS "C0", 
		(Q4.L_EXTENDEDPRICE *(+1.00000000000000E+000 - Q4.L_DISCOUNT)) AS "C1",
                         Q5.O_ORDERDATE AS "C2", Q1.R_NAME AS "C3"
            			FROM TPCD1.REGION AS Q1,TPCD1.NATION AS Q2, 
                             TPCD1.SUPPLIER AS Q3, TPCD2.LINEITEM AS Q4,
                             TPCD2.ORDERS AS Q5, TPCD2.CUSTOMER AS Q6
            WHERE (Q2.N_REGIONKEY = Q1.R_REGIONKEY) AND (Q3.S_NATIONKEY = Q2.N_NATIONKEY)
            AND (Q6.C_NATIONKEY = Q3.S_NATIONKEY) AND (Q4.L_SUPPKEY = Q3.S_SUPPKEY)
            AND (Q5.O_ORDERKEY = Q4.L_ORDERKEY) AND (Q6..C_CUSTKEY = Q5.O_CUSTKEY 
		AND (R_NAME = 'ASIA'))) AS Q7 
            GROUP BY Q7.C2, Q7.C3, Q7.C0) AS Q8) 
DATA INITIALLY DEFERRED REFRESH DEFERRED;

Next, we created and refreshed the MQTs in the federated database and we ran the workload again to measure the execution times for the queries that are routed to the MQTs.

The following table summarizes the experimental results. The Design Advisor recommended eight MQTs that are applicable to eight of the TPC-H queries shown in the table below. The first column represents the TPC-H query number. The next two columns represent the query execution time in seconds when MQTs are disabled and when MQTs are enabled. The fourth column shows the improvement introduced by the MQTs as a time difference between the second and third columns and the fifth column shows the percentage of the improvement. The last column lists the customizations we made to the MQTs that were recommended by the Design Advisor. Please note that while our workload was defined to use only one MQT per query, it is possible to use multiple MQTs to answer a query. MQTs can be defined to answer more than one query, as well.

Query #Time w/o MQT(secs)Time w/ MQT(secs)Improvement(secs)Improvement%MQT customization
325121352%C_MKTSEGMENT =
'BUILDING'
57817799%R_NAME = 'ASIA'
982675%
1068501826%L_RETURNFLAG = 'R'
12107330%L_SHIPMODE IN
('MAIL','SHIP')
142522392%L_SHIPDATE >= DATE
('1995-09-01')
18205120499%
19141114099%l_SHIPMODE IN
('AIR', 'AIR REG') AND
L_SHIPINSTRUCT =
'DELIVER IN PERSON'
AND ( P_BRAND =
'BRAND#12' OR
P_BRAND = 'BRAND#23'
OR P_BRAND = 'BRAND#34')
TOTAL5607646483%

*Please note that these times are provided simply to demonstrate the improvement attainable with MQTs defined over remote tables. They are not derived from a compliant TPC-H benchmark and should not be compared with any existing official results.


Conclusion

WebSphere Information Integrator provides the foundation for a strategic information integration framework that helps customers speed time to market for new applications, get more value and insight from existing assets, and control IT costs. As more customers adopt the data federation technology, the next logical concern for customers is usually to achieve higher performance, as the complexity of their queries increases.

We consider materialized query table (MQT) support an important option to further fine tune the performance of your federated server. By identifying a suitable set of MQTs based on your query workload, you can define and populate these MQTs with precomputed query results and allow your query workload to run faster by exploiting these MQTs. As we demonstrated in this article, the improvement in some cases can be quite significant.

And the best part is that you do not need to modify your applications to take advantage of MQTs. The federated query compiler makes a cost-based decision and makes the process of using MQTs transparent to the users. We are confident that you will find this feature a powerful tool in your federation environment.


Resources

Learn

Get products and technologies

About the authors

Author photo

Anjali Betawadkar-Norwood is an Advisory Software Engineer in Silicon Valley Laboratory in San Jose, California. Her expertise is Query Optimization, especially as it applies to federated systems. She has been working in the area of query optimization for five years. Currently, she leads a small team focusing on federated query optimization in the WebSphere Information Integrator Federated Query Compiler team.

author photo

Valer-Alin Crisan is a Software Engineer in Silicon Valley Laboratory in San Jose, California. His expertise is query optimization for federated systems. He has been working in the area of query optimization for about a year and a half in a small team focusing on federated query optimization in the WebSphere Information Integrator Federated Query Compiler team.

Author photo

Dr. Eileen Lin is a Senior Technical Staff Member in Silicon Valley Laboratory in San Jose, California. She is one of the original members responsible for the success of DataJoiner, a federated database product that is the predecessor of the federation technology in DB2. Currently, she is the lead architect for WebSphere Federation Server. Dr. Lin has many patents covering areas such as federation technology, query optimization and parallel query processing.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, WebSphere
ArticleID=123037
ArticleTitle=Maximize the performance of WebSphere Information Integrator with Materialized Query Tables
publish-date=05232006
author1-email=anorwood@us.ibm.com
author1-email-cc=
author2-email=vcrisan@us.ibm.com
author2-email-cc=
author3-email=etlin@us.ibm.com
author3-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers