Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Using the DB2 V8.2 Design Advisor, Part 2: Expert help for designing data warehouses

Paul McInerney (paulmci@ca.ibm.com), User-Centered Design Specialist, DB2 Development, IBM, Software Group
Paul McInerney photo
Paul McInerney participated in the development of the Design Advisor. His interests include designing effective software tools that provide advice to people.
Danny Zilio (zilio@ca.ibm.com), Development Analyst, DB2 Development, IBM, Software Group
Danny Zilio photo
Danny Zilio has led the development of the Design Advisor over several releases. He is actively involved in the larger Autonomic Computing initiative.

Summary:  Get expert help with designing data partitioning keys, materialized query tables, and multidimensional clustering for IBM® DB2® Universal Database™ (DB2 UDB), in addition to indexes. The Design Advisor offers the help you need and is built right into DB2 UDB. Part 2 of a series, this article explains how to use it to achieve an optimal design for your data warehouse.

View more content in this series

Date:  01 Sep 2005
Level:  Intermediate

Activity:  3752 views
Comments:  

Introduction

This article is the second of a two-part series on the IBM DB2 Universal Database (DB2 UDB) V8.2 Design Advisor feature. This article focuses on the effective use of the Design Advisor in the context of data warehouse (DW) databases. For a general introduction and tips for using the Design Advisor, read Part 1.

Distinctive design challenges with data warehouses

The Design Advisor is well suited to the distinctive challenges of designing the physical database for optimal data access in the context of data warehouses. Some aspects of DW considered during the development of this tool include:

  • Data warehouses often make use of the following DW-oriented DB2 features: materialized query tables (MQTs), multidimensional clustering tables (MDCs), and Data Partitioning Feature (DPF) hash partitioning. So, design decisions are more complicated than simply optimizing indexes; the use of these multiple features must be coordinated to achieve full potential performance.
  • DWs are often very large, up to multi-terabytes. In this environment, storage implications for design decisions become more important. In fact, finding indexes that can be dropped because they are not providing sufficient benefit can be an important design goal.
  • The size of DWs also makes it more difficult to change existing physical DB design when that means moving lots of data (using export and import). That places a premium on a tool that can help evaluate design alternatives.
  • The nature of the workload is more ad hoc. This makes it difficult for DBAs to correctly model the workload prior to actual system use. Therefore, there is more need to adjust the physical database design as the DBA monitors system usage.
  • DWs often have IBM products that complement the Design Advisor, namely DB2 Query Patroller and DB2 Cube Views. Query Patroller provides a good source of input for the Design Advisor, while Cube Views includes a tool narrowly targeted at designing MQTs for use in OLAP schemas.
  • DWs often include views on tables to present a simplified, denormalized view of the data to end users. DW designers may be interested in identifying cases where performance would benefit from converting these logical views to MQTs.

Introduction to the case study and the walkthrough

The following case study demonstrates the Design Advisor: You have an existing data warehouse that you are upgrading to DB2 UDB V8.2 and you want to update to use the latest DB2 features, specifically MDC and MQTs. Your database already uses the DPF feature and the Query Patroller product. The DW has a table that you think would be a good candidate for converting to MDC. You are willing to make changes to the physical DB design in terms of dropping and recreating tables, indexes, and so on, in order to implement Design Advisor recommendations. However, you'll have to schedule an outage to do this.

The walkthrough of this case study is divided into the following steps:

  1. Collect and describe the workload to the Design Advisor:
    1. Collecting the workload from Query Patroller
    2. Using the Design Advisor
  2. Decide how and when you want the Design Advisor to run
  3. Review the recommendations
  4. Next steps

The next sections illustrate these steps with the GUI. A subsequent section illustrates the command line version of the tool.


Step 1: Collect and describe the workload to the Design Advisor

Collecting the workload from Query Patroller

Before using the Design Advisor, you need to determine how to get a representative sample of the workload to provide as input. This can be challenging due to the ad hoc nature of data warehouse workloads. Fortunately, in this case study the task is simplified by the presence of Query Patroller. You can use Query Patroller to select from the historical record of past queries. You can select a subset that best meets your needs such as:

  • The largest 10% of queries
  • Only queries from a particular application that accesses a particular set of tables (for example, a fact table and its dimension tables)
  • A sample that includes queries from each end user tool (for example, Business Objects vs. SAS)
  • A sample that includes queries that access each table
  • A random sample

The selection of the workload should match the goals of the tuning exercise. The recommendations should be interpreted in the context of the workload that you provide to the Design Advisor. While you could include the entire set of queries, which could be in the 10,000 to 1 million range, this is not recommended. Usually, a set of queries ranging from 10s to 100s strikes the best balance between getting good results and consuming DB2 resources.

In this case, you might use a sample of queries that run on one Monday morning when there is a lot of Business Objects activity, and during one night when some of the bigger SAS jobs are running. Query the Query Patroller tables to get those queries and export the queries to a file. (Consult Query Patroller documentation for detailed instructions.) Use this file as input to the Design Advisor.

Alternative: The Design Advisor has the capability to directly import the entire Query Patroller set of queries. However, this may not be the best approach if there are many queries (for example, over 10,000). One of the targeted approaches described above may be more advisable. Also, the command-line version of this tool (db2advis) has an additional option to directly obtain a workload from QP that includes queries between a start and end timestamp.

Collect and describe the workload to the Design Advisor

With your workload file in hand, you can launch the Design Advisor. As you see in Figure 1, the Design Advisor allows you to select which features you want it to consider, such as indexes, MQT, MDC, and partitioning. This example includes all the options.


Figure 1. Select performance features
Select performance features

Tip: Select all the options. This ensures the Design Advisor finds a globally optimal solution by exploiting synergy among the features.

Alternate approach: Selecting all the options can make it hard to understand why the Design Advisor recommended what it did. Some users may prefer to proceed step-by-step at the possible expense of an optimal solution. The following are some examples of step-by-step approaches:

  • Focus on a single application at a time -- its workload and the parts of the schema that it accesses.
  • Focus on optimizing access to one table at a time. That is, first identify queries that hit table X and implement only recommendations that pertain to that table. Repeat the process for subsequent tables.
  • Focus on a particular feature. For example, focus only on redesigning BigFactTable using MDC and repartitioning. Leave MQTs for a future iteration, perhaps after validating the initial changes in production.

Tip: Consider MQTs with deferred refresh. While this example uses Immediate refresh, deferred refresh MQTs are generally more compatible with ETL activity.

Next, provide the workload to the Design Advisor. In this case, you have a file with workload data obtained from Query Patroller.


Step 2: Decide how you want the Design Advisor to run

There are no special considerations for this step beyond the points discussed in Part 1 of this article series.


Step 3: Review the recommendations

As you see in Figure 2, the Design Advisor generated a set of recommendations that would lead to a 60% improvement in performance. You can also see a partial list of the recommendations, namely, to create 3 MQTs.


Figure 2. Select the recommendations
Select the recommendations

More of the recommendations are shown in Figure 3. On the CUSTOMER table, the recommendation is to create two additional indexes and keep an existing one. One of the recommendations shown for the LINEITEM table is to change the hash partitioning key.


Figure 3. Design recommendations
Design recommendations

Tip: Recall that the workload did not contain any ETL work. In considering these recommendations, keep in mind possible impacts of these recommendations on the performance of ETL work.

New in V8.2, the Design Advisor has detailed reports that let you get a lot more insight into the recommendations. Figure 4 shows which queries benefited most from the recommendations. There are huge variations in improvement across different SQL statements.


Figure 4. Design Advisor reports
Design Advisor reports

Finally, as you see in Figure 5, the Design Advisor identified some objects that were not needed to achieve optimal performance for this workload. These are candidates for dropping. However, before doing so, you need to consider whether the workload you provided to the Design Advisor omitted some statements that might benefit from the objects.


Figure 5. Unused objects
Unused objects

Step 4 - Next steps

There are no special considerations beyond the points discussed in Part 1 of this article series. To reiterate, it may seem that the natural next step after accepting the recommendations is to make the changes. However, there are alterative things you may wish to do at this point:

  • Save the recommendations to show someone else.
  • Re-run the Design Advisor with different inputs to see if you can get a better recommendation.

Before implementing recommendations, a DBA might typically:

  • Replace the generated names with names that conform to shop conventions.
  • Review the tablespaces where the objects are to be stored.
  • Add comments to the object to indicate why it was created.

DB2 V8.2 has some complementary enhancements that make it easier to implement these recommendations. These are online index creation and ALTER TABLE support.


Command-line (db2advis) walkthrough

Now we’ll walk through the same case study, this time using the command line instead of the GUI.

Here is the command used in this case study:

db2advis –d TPCD –i tpch_queries.in –m IMCP –k LOW  -l 700 –c DB2ADVIS_TBSP -f

Highlights include:

  • -m IMCP: This specifies that the Design Advisor should consider new indexes (I), new MQTs (M), converting standard tables to MDC tables (C), and repartitioning existing tables (P). The default is indexes only.
  • -k LOW: This specifies to compress the workload to a Low degree. As a result, the Design Advisor will analyze a larger set of the workload you provided. The default is medium.
  • -l 700: This specifies that any new indexes, MQTs, and so on should consume no more than 700 MB. The default is 20% of the total database size.
  • -c DB2ADVIS_TBSP: This specifies a tablespace called DB2ADVIS_TBSP to use as a temporary workspace for generating MQT recommendations. This option is required if you want MQT recommendations and you are running on a DPF (multiple partition) instance. Otherwise, this parameter is optional.

Another useful option (not shown) is -o output_file. This saves the script to create the recommended objects in a file. See the Resources section for links to more comprehensive information on command options.

As the command executes, it describes work in progress, some of which is shown below. At this point, the advisor has generated recommendations for everything except MDCs.

Cost of workload with all recommendations included [1306186] timerons
27 indexes in current solution
3 partitionings in current solution
8 MQTs in current solution

The recommendation set consists of 27 indexes (which could be current or new ones), 3 partitionings (that is, DPF-related changes like new partitioning keys or tablespaces), and 8 MQTs (new or existing).

Next the Design Advisor analyzes MDCs and it displays the following when it is complete.

3 clustering dimensions in current solution
[12305400] timerons (without any recommendations)
[1042873] timerons (with current solution)
[91.53%] improvement

The phrase "3 clustering dimensions" means that the Design Advisor recommends 3 MDC dimensions. These could all be on the same table or on different tables, for example, 3 dimensions on Table A, or 1 dimension on Table A and two on Table B. The performance statistics refer to performance for all recommendations, not just the MDC recommendations. The item "timerons (without any recommendations)" refers to the performance with the existing design while "with current solution" refers to estimated performance with all the recommendations implemented.

Next the Design Advisor displays the recommendations in the form of DDL which has been commented out. The recommendations appear in the following order:

  • Base tables that include MDC or partitioning recommendations
  • MQT recommendations (first new ones, then existing ones to keep, and finally unused ones)
  • New clustering indexes (if any)
  • Index recommendations (new, keep, unused)

Recommendations for changing one table are shown below:

-- CREATE TABLE "TPCD"."LINEITEM" ("L_ORDERKEY BIGINT NOT NULL, 
--	"L_PART" INTEGER NOT NULL,
--	"L_SUPPKEY" INTEGER NOT NULL,
--	"L_LINENUMBER" INTEGER NOT NULL,
--	"L_SHIPINSTRUCT" CHAR(25)  NOT NULL,
(11 other columns omitted from this example)
-- 	MDC409022109290000 GENERATED ALWAYS AS ( ((INT(L_SHIPDATE))/7) )
-- ----	PARTITIONING KEY ("L_PARTKEY") USING HASHING
-- ----	IN "TPCDLADT"
-- ORGANIZE BY (
--	MDC409022109290000, 
--	L_SHIPINSTRUCT )
-- PARTITIONING KEY (L_ORDERKEY) USING HASHING
-- IN TPCDLDAT
--;
-- COMMIT WORK ;

Note that a new partitioning key is recommended (L_ORDERKEY) to replace the current one (L_PARTKEY), which is commented out. The MDC recommendation for this table (ORGANIZE BY clause) includes two dimensions: a generated column (INT(L_SHIPDATE/7) and an existing column (L_SHIPINSTRUCT).

Next in the output are recommendations related to MQTs, as shown below.

-- LIST OF RECOMMENDED MQTs
-- ========================
-- MQT MQT40902204140000 can be created as a refresh immediate MQT
-- mqt[1],   0.009MB
CREATE SUMMARY TABLE "ADVDEMO2"." MQT40902204140000" 
    AS (SELECT Q6.CO AS "CO", Q6.C1 AS "C1", …additional details omitted here…)
    DATA INITIALLY DEFERRED REFRESH IMMEDIATE PARTITIONING KEY (C8) 
    USING HASHING IN TPCDLDAT ; 
COMMIT WORK;
REFRESH TABLE "ADVDEMO2"." MQT40902204140000"; 
COMMIT WORK;
RUNSTATS ON TABLE "ADVDEMO2"." MQT40902204140000"; 
COMMIT WORK;
-- MQT MQT409022041530000 can be created as a refresh immediate MQT
(… DDL to create this table follows…)

The MQT recommendations include: estimated size, tablespace to use, partitioning key (if applicable), type of refresh (immediate or deferred), and whether the table is a replica of a base table (indicated by the REPLICATE keyword), which it is not in this case.

Finally, the Design Advisor completes with the information shown below.

8604 solutions were evaluated by the advisor
DB2 Workload Performance Advisor tool is finished.


Conclusions

The Design Advisor is well suited for optimizing database design for data warehouses as it provides advice on MQTs, MDCs, and DPF partitioning keys in addition to indexes. Due to the ad hoc nature of query workloads in data warehouses, the way this tool can analyze that workload and provide recommendations is invaluable.


Resources

About the authors

Paul McInerney photo

Paul McInerney participated in the development of the Design Advisor. His interests include designing effective software tools that provide advice to people.

Danny Zilio photo

Danny Zilio has led the development of the Design Advisor over several releases. He is actively involved in the larger Autonomic Computing initiative.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=93340
ArticleTitle=Using the DB2 V8.2 Design Advisor, Part 2: Expert help for designing data warehouses
publish-date=09012005
author1-email=paulmci@ca.ibm.com
author1-email-cc=
author2-email=zilio@ca.ibm.com
author2-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers