DB2 Basics: An introduction to materialized query tables

The definition of a materialized query table (MQT) is based upon the result of a query. MQTs can significantly improve the performance of queries. This article introduces you to MQTs, summary tables, and staging tables, and shows you, by way of working examples, how to get up and running with materialized query tables.

Share:

Roman Melnyk (roman_b_melnyk@hotmail.com), DB2 Information Development, IBM Canada Ltd.

Roman Melnyk photoRoman B. Melnyk , Ph.D., is a senior member of the DB2 Information Development team, specializing in database administration, DB2 utilities, and SQL. During more than nine years at IBM, Roman has written numerous DB2 books, articles, and other related materials. Roman coauthored DB2 Version 8: The Official Guide (Prentice Hall Professional Technical Reference, 2003), DB2: The Complete Reference (Osborne/McGraw-Hill, 2001), DB2 Fundamentals Certification for Dummies (Hungry Minds, 2001), and DB2 for Dummies (IDG Books, 2000).



08 September 2005

Also available in Vietnamese

IBM DB2 e-kit for Database Professionals

Learn how easy it is to get trained and certified for DB2 for Linux, UNIX, and Windows with the IBM DB2 e-kit for Database Professionals. Register now, and expand your skills portfolio, or extend your DBMS vendor support to include DB2.

A materialized query table (MQT) is a table whose definition is based upon the result of a query. The data that is contained in an MQT is derived from one or more tables on which the materialized query table definition is based. Summary tables (or automatic summary tables, ASTs), which are familiar to IBM® DB2® Universal Database™ (UDB) for Linux, UNIX®, and Windows® (DB2 UDB) users, are considered to be a specialized type of MQT. The fullselect that is part of the definition of a summary table contains a GROUP BY clause summarizing data from the tables that are referenced in the fullselect.

You can think of an MQT as a kind of materialized view. Both views and MQTs are defined on the basis of a query. The query on which a view is based is run whenever the view is referenced; however, an MQT actually stores the query results as data, and you can work with the data that is in the MQT instead of the data that is in the underlying tables.

Materialized query tables can significantly improve the performance of queries, especially complex queries. If the optimizer determines that a query or part of a query could be resolved using an MQT, the query might be rewritten to take advantage of the MQT.

An MQT can be defined at table creation time as maintained by the system or maintained by the user. The following sections introduce you to these two types of MQTs, as well as summary tables and staging tables. The examples that follow require a connection to the SAMPLE database; if you don’t have the SAMPLE database created on your system, you can create it by entering the db2sampl command from any command prompt.

Maintained by system MQTs

The data in this type of materialized query table is maintained by the system. When you create this type of MQT, you can specify whether the table data will be a REFRESH IMMEDIATE or REFRESH DEFERRED. The REFRESH keyword lets you specify how the data is to be maintained. DEFERRED means that the data in the table can be refreshed at any time using the REFRESH TABLE statement. Neither REFRESH DEFERRED nor REFRESH IMMEDIATE system-maintained MQTs allow insert, update, or delete operations to be executed against them. However, REFRESH IMMEDIATE system-maintained MQTs are updated with changes made to the underlying tables as a result of insert, update, or delete operations.

Listing 1 shows an example of creating a REFRESH IMMEDIATE system-maintained MQT. The table, which is named EMP, is based on the underlying tables EMPLOYEE and DEPARTMENT in the SAMPLE database. Because REFRESH IMMEDIATE MQTs require that at least one unique key from each table referenced in the query appear in the select list, we first define a unique constraint on the EMPNO column in the EMPLOYEE table and on the DEPTNO column in the DEPARTMENT table. The DATA INITIALLY DEFERRED clause simply means that data will not be inserted into the table as part of the CREATE TABLE statement. After being created, the MQT is in check pending state (see Demystifying table and table space states), and cannot be queried until the SET INTEGRITY statement has been executed against it. The IMMEDIATE CHECKED clause specifies that the data is to be checked against the MQT's defining query and refreshed; the NOT INCREMENTAL clause specifies that integrity checking is to be done on the whole table. A query executed against the EMP materialized query table shows that it is now fully populated with data.

Listing 1. Creating an MQT that is to be maintained by the system
connect to sample
...

alter table employee add unique (empno)
alter table department add unique (deptno)

create table emp as (select e.empno, e.firstnme, e.lastname, e.phoneno, d.deptno,
 substr(d.deptname, 1, 12) as department, d.mgrno from employee e, department d
  where e.workdept = d.deptno)
   data initially deferred refresh immediate

set integrity for emp immediate checked not incremental

select * from emp

EMPNO  FIRSTNME     LASTNAME        PHONENO DEPTNO DEPARTMENT   MGRNO
------ ------------ --------------- ------- ------ ------------ ------
000010 CHRISTINE    HAAS            3978    A00    SPIFFY COMPU 000010
000020 MICHAEL      THOMPSON        3476    B01    PLANNING     000020
000030 SALLY        KWAN            4738    C01    INFORMATION  000030
000050 JOHN         GEYER           6789    E01    SUPPORT SERV 000050
000060 IRVING       STERN           6423    D11    MANUFACTURIN 000060
000070 EVA          PULASKI         7831    D21    ADMINISTRATI 000070
000090 EILEEN       HENDERSON       5498    E11    OPERATIONS   000090
000100 THEODORE     SPENSER         0972    E21    SOFTWARE SUP 000100
000110 VINCENZO     LUCCHESSI       3490    A00    SPIFFY COMPU 000010
000120 SEAN         O'CONNELL       2167    A00    SPIFFY COMPU 000010
000130 DOLORES      QUINTANA        4578    C01    INFORMATION  000030
...
000340 JASON        GOUNOT          5698    E21    SOFTWARE SUP 000100

  32 record(s) selected.

connect reset

Maintained by user MQTs

The data in this type of materialized query table is maintained by the user. Only a REFRESH DEFERRED materialized query table can be defined as MAINTAINED BY USER. The REFRESH TABLE statement (used for system-maintained MQTs) cannot be issued against user-maintained MQTs. User-maintained MQTs do allow insert, update, or delete operations to be executed against them.

Listing 2 shows an example of creating a REFRESH DEFERRED user-maintained MQT. The table, which is named ONTARIO_1995_SALES_TEAM, is based on the underlying tables EMPLOYEE and SALES in the SAMPLE database. Again, the DATA INITIALLY DEFERRED clause means that data will not be inserted into the table as part of the CREATE TABLE statement. After being created, the MQT is in check pending state (see Demystifying table and table space states), and cannot be queried until the SET INTEGRITY statement has been executed against it. The MATERIALIZED QUERY IMMEDIATE UNCHECKED clause specifies that the table is to have integrity checking turned on, but is to be taken out of check pending state without being checked for integrity violations.

Next, to populate the MQT with some data, we will import data that had been exported from the EMPLOYEE and SALES tables. The exporting query matches the defining query for the MQT. Then we will insert another record into the ONTARIO_1995_SALES_TEAM table.

A query executed against the ONTARIO_1995_SALES_TEAM materialized query table shows that it is now fully populated with the imported and inserted data, demonstrating that user-maintained MQTs can indeed be modified directly.

Listing 2. Creating an MQT that is to be maintained by the user
connect to sample
...

create table ontario_1995_sales_team as (select distinct e.empno, e.firstnme,
 e.lastname, e.workdept, e.phoneno, 'Ontario' as region,
  year(s.sales_date) as year from employee e, sales s
   where e.lastname = s.sales_person and year(s.sales_date) = 1995
    and left(s.region, 3) = 'Ont')
     data initially deferred refresh deferred maintained by user

set integrity for ontario_1995_sales_team materialized query immediate
 unchecked

export to ontario_1995_sales_team.del of del
 select distinct e.empno, e.firstnme, e.lastname, e.workdept, e.phoneno,
  'Ontario' as region, year(s.sales_date) as year from employee e, 
  sales s
   where e.lastname = s.sales_person and year(s.sales_date) = 1995
    and left(s.region, 3) = 'Ont'
...
Number of rows exported: 2

import from ontario_1995_sales_team.del of del insert into 
ontario_1995_sales_team
...
Number of rows committed    = 2

insert into ontario_1995_sales_team
 values ('006900', 'RUSS', 'DYERS', 'D44', '1234', 'Ontario', 1995)

select * from ontario_1995_sales_team

EMPNO  FIRSTNME     LASTNAME        WORKDEPT PHONENO REGION  YEAR
------ ------------ --------------- -------- ------- ------- -----------
000110 VINCENZO     LUCCHESSI       A00      3490    Ontario        1995
000330 WING         LEE             E21      2103    Ontario        1995
006900 RUSS         DYERS           D44      1234    Ontario        1995

  3 record(s) selected.

connect reset

Summary tables

You will recall that a summary table is a specialized type of MQT whose fullselect contains a GROUP BY clause summarizing data from the tables that are referenced in the fullselect. Listing 3 shows a simple example of creating a summary table. The table, which is named SALES_SUMMARY, is based on the underlying table SALES in the SAMPLE database. Once again, the DATA INITIALLY DEFERRED clause means that data will not be inserted into the table as part of the CREATE TABLE statement. The REFRESH DEFERRED clause means that the data in the table can be refreshed at any time using the REFRESH TABLE statement. A query against this MQT right after it was created, but before the REFRESH TABLE statement was issued, returns an error. After the REFRESH TABLE statement executes, the query runs successfully.

A subsequent insert operation into the SALES table, followed by a summary table refresh and a query against the summary table, shows that the change to the underlying table is reflected in the summary table: salesperson Lee's total sales in the Ontario-South region have increased by 100. Similar behavior can be observed in response to update or delete operations against the underlying SALES table.

Listing 3. Creating a summary table
connect to sample
...

create table sales_summary as (select sales_person, region, sum(sales)
 as total_sales
 from sales group by sales_person, region)
  data initially deferred refresh deferred

select * from sales_summary

SALES_PERSON    REGION          TOTAL_SALES
--------------- --------------- -----------
SQL0668N  Operation not allowed for reason code "1" on table
"MELNYK.SALES_SUMMARY".  SQLSTATE=57016

refresh table sales_summary

select * from sales_summary

SALES_PERSON    REGION          TOTAL_SALES
--------------- --------------- -----------
GOUNOT          Manitoba                 15
GOUNOT          Ontario-North             1
GOUNOT          Ontario-South            10
GOUNOT          Quebec                   24
LEE             Manitoba                 23
LEE             Ontario-North             8
LEE             Ontario-South            34
LEE             Quebec                   26
LUCCHESSI       Manitoba                  3
LUCCHESSI       Ontario-South             8
LUCCHESSI       Quebec                    3

  11 record(s) selected.

insert into sales values ('06/28/2005', 'LEE', 'Ontario-South', 100)

refresh table sales_summary

select * from sales_summary

SALES_PERSON    REGION          TOTAL_SALES
--------------- --------------- -----------
...
LEE             Ontario-North             8
LEE             Ontario-South           134
LEE             Quebec                   26
...

  11 record(s) selected.

update sales set sales = 50 where sales_date = '06/28/2005' and 
sales_person = 'LEE'
 and region = 'Ontario-South'

refresh table sales_summary

select * from sales_summary

SALES_PERSON    REGION          TOTAL_SALES
--------------- --------------- -----------
...
LEE             Ontario-North             8
LEE             Ontario-South            84
LEE             Quebec                   26
...

  11 record(s) selected.

delete from sales where sales_date = '06/28/2005' and sales_person = 'LEE'
 and region = 'Ontario-South'

refresh table sales_summary

select * from sales_summary

SALES_PERSON    REGION          TOTAL_SALES
--------------- --------------- -----------
...
LEE             Ontario-North             8
LEE             Ontario-South            34
LEE             Quebec                   26
...

  11 record(s) selected.

connect reset

Staging tables

You can incrementally refresh a REFRESH DEFERRED MQT if it has a staging table associated with it. The staging table collects changes that need to be applied to synchronize the MQT with its underlying tables. You can create a staging table using the CREATE TABLE statement; then, when the underlying tables of the MQT are modified, the changes are propagated and immediately appended to the staging table. The idea is to use the staging table to incrementally refresh the MQT, rather than regenerate the MQT from scratch. Incremental maintenance provides significant performance improvement. The staging table is pruned when the refresh operation is complete.

After it is created, a staging table is in a pending (inconsistent) state; it must be brought out of this state before it can start collecting changes to its underlying tables. You can accomplish this by using the SET INTEGRITY statement.

Listing 4 shows an example of using a staging table with a summary table. The summary table, which is named EMP_SUMMARY, is based on the underlying table EMPLOYEE in the SAMPLE database. You'll recall that the DATA INITIALLY DEFERRED clause means that data will not be inserted into the table as part of the CREATE TABLE statement. The REFRESH DEFERRED clause means that the data in the table can be refreshed at any time using the REFRESH TABLE statement. The staging table, which is named EMP_SUMMARY_S, is associated with the summary table EMP_SUMMARY. The PROPAGATE IMMEDIATE clause specifies that any changes made to the underlying table as part of an insert, update, or delete operation are cascaded to the staging table. SET INTEGRITY statements are issued against both tables to take them out of their pending states.

Not unexpectedly, a query against the summary table at this point returns no data. The REFRESH TABLE statement returns a warning, a reminder that the "integrity of non-incremental data remains unverified." This, too, is not unexpected. Another query against the summary table returns no data as well. However, after we insert a new row of data into the underlying EMPLOYEE table, a query against the staging table EMP_SUMMARY_S returns one row, corresponding to the data that was just inserted. The staging table has the same three columns that its underlying summary table has, plus two additional columns that are used by the system: GLOBALTRANSID (the global transaction ID for each propagated row) and GLOBALTRANSTIME (the timestamp of the transaction). Another query against the summary table returns no data, but after the REFRESH TABLE statement executes this time, the query runs successfully.

Listing 4. Using a staging table with a summary table
connect to sample
...

create table emp_summary as (select workdept, job, count(*) as count
 from employee group by workdept, job)
  data initially deferred refresh deferred

create table emp_summary_s for emp_summary propagate immediate

set integrity for emp_summary materialized query immediate unchecked

set integrity for emp_summary_s staging immediate unchecked

select * from emp_summary

WORKDEPT JOB      COUNT
-------- -------- -----------

  0 record(s) selected.

refresh table emp_summary
SQL1594W  Integrity of non-incremental data remains unverified by the 
database manager.  SQLSTATE=01636

select * from emp_summary

WORKDEPT JOB      COUNT
-------- -------- -----------

  0 record(s) selected.

insert into employee
 values ('006900', 'RUSS', 'L', 'DYERS', 'D44', '1234', '1960-05-05', 
 'FIELDREP', 5, 'M', '1940-04-02', 10000, 100, 1000)

select * from emp_summary_s

WORKDEPT JOB      COUNT       GLOBALTRANSID          GLOBALTRANSTIME
-------- -------- ----------- -------------------... -----------------------------...
D44      FIELDREP           1 x'00000000000000CD'    x'20050822201344536158000000'

  1 record(s) selected.

select * from emp_summary

WORKDEPT JOB      COUNT
-------- -------- -----------

  0 record(s) selected.

refresh table emp_summary
SQL1594W  Integrity of non-incremental data remains unverified by the database
manager.  SQLSTATE=01636

select * from emp_summary

WORKDEPT JOB      COUNT
-------- -------- -----------
D44      FIELDREP           1

  1 record(s) selected.

connect reset

Summary

The SYSCAT.TABDEP system catalog view contains a row for every dependency that a materialized query table has on some other object. You can query this view to obtain a dependency summary for the MQTs that we have created (Listing 5). MQTs have a DTYPE value of 'S.' The TABNAME column lists the names of the MQTs, and the BNAME column lists the names of the database objects on which the corresponding MQTs depend. The BTYPE column identifies the object type: 'T' for table, 'I' for index, and 'F' for function instance.

Listing 5. Querying the SYSCAT.TABDEP system catalog view to see MQT dependencies on other database objects
connect to sample
...

select substr(tabname,1,24) as tabname, dtype, substr(bname,1,24) as bname, btype
 from syscat.tabdep where tabschema = 'MELNYK' and dtype = 'S'

TABNAME                  DTYPE BNAME                    BTYPE
------------------------ ----- ------------------------ -----
EMP                      S     DEPARTMENT               T
EMP                      S     EMPLOYEE                 T
EMP                      S     SQL050829104058970       I
EMP                      S     SQL050829104058800       I
EMP_SUMMARY              S     EMPLOYEE                 T
ONTARIO_1995_SALES_TEAM  S     LEFT1                    F
ONTARIO_1995_SALES_TEAM  S     SALES                    T
ONTARIO_1995_SALES_TEAM  S     EMPLOYEE                 T
SALES_SUMMARY            S     SALES                    T

  9 record(s) selected.

connect reset

We have seen that a materialized query table, whose definition is based upon the result of a query, can be thought of as a kind of materialized view. MQTs are important because they can significantly decrease the response time for complex queries. This article has introduced you to the basic concepts around maintained by system MQTs and maintained by user MQTs, as well as summary tables and staging tables, and these concepts were illustrated by working examples that you can run yourself. To learn more about materialized query tables, or for more detailed information about any of the topics covered in this article, see the DB2 Information Center.

Resources

Learn

Get products and technologies

  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=93583
ArticleTitle=DB2 Basics: An introduction to materialized query tables
publish-date=09082005