This article is the second of a two-part series on the IBM DB2 Universal Database (DB2 UDB) V8.2 Design Advisor feature. This article focuses on the effective use of the Design Advisor in the context of data warehouse (DW) databases. For a general introduction and tips for using the Design Advisor, read Part 1.
Distinctive design challenges with data warehouses
The Design Advisor is well suited to the distinctive challenges of designing the physical database for optimal data access in the context of data warehouses. Some aspects of DW considered during the development of this tool include:
- Data warehouses often make use of the following DW-oriented DB2 features: materialized query tables (MQTs), multidimensional clustering tables (MDCs), and Data Partitioning Feature (DPF) hash partitioning. So, design decisions are more complicated than simply optimizing indexes; the use of these multiple features must be coordinated to achieve full potential performance.
- DWs are often very large, up to multi-terabytes. In this environment, storage implications for design decisions become more important. In fact, finding indexes that can be dropped because they are not providing sufficient benefit can be an important design goal.
- The size of DWs also makes it more difficult to change existing physical DB design when that means moving lots of data (using export and import). That places a premium on a tool that can help evaluate design alternatives.
- The nature of the workload is more ad hoc. This makes it difficult for DBAs to correctly model the workload prior to actual system use. Therefore, there is more need to adjust the physical database design as the DBA monitors system usage.
- DWs often have IBM products that complement the Design Advisor, namely DB2 Query Patroller and DB2 Cube Views. Query Patroller provides a good source of input for the Design Advisor, while Cube Views includes a tool narrowly targeted at designing MQTs for use in OLAP schemas.
- DWs often include views on tables to present a simplified, denormalized view of the data to end users. DW designers may be interested in identifying cases where performance would benefit from converting these logical views to MQTs.
Introduction to the case study and the walkthrough
The following case study demonstrates the Design Advisor: You have an existing data warehouse that you are upgrading to DB2 UDB V8.2 and you want to update to use the latest DB2 features, specifically MDC and MQTs. Your database already uses the DPF feature and the Query Patroller product. The DW has a table that you think would be a good candidate for converting to MDC. You are willing to make changes to the physical DB design in terms of dropping and recreating tables, indexes, and so on, in order to implement Design Advisor recommendations. However, you'll have to schedule an outage to do this.
The walkthrough of this case study is divided into the following steps:
- Collect and describe the workload to the Design Advisor:
- Decide how and when you want the Design Advisor to run
- Review the recommendations
- Next steps
The next sections illustrate these steps with the GUI. A subsequent section illustrates the command line version of the tool.
Step 1: Collect and describe the workload to the Design Advisor
Collecting the workload from Query Patroller
Before using the Design Advisor, you need to determine how to get a representative sample of the workload to provide as input. This can be challenging due to the ad hoc nature of data warehouse workloads. Fortunately, in this case study the task is simplified by the presence of Query Patroller. You can use Query Patroller to select from the historical record of past queries. You can select a subset that best meets your needs such as:
- The largest 10% of queries
- Only queries from a particular application that accesses a particular set of tables (for example, a fact table and its dimension tables)
- A sample that includes queries from each end user tool (for example, Business Objects vs. SAS)
- A sample that includes queries that access each table
- A random sample
The selection of the workload should match the goals of the tuning exercise. The recommendations should be interpreted in the context of the workload that you provide to the Design Advisor. While you could include the entire set of queries, which could be in the 10,000 to 1 million range, this is not recommended. Usually, a set of queries ranging from 10s to 100s strikes the best balance between getting good results and consuming DB2 resources.
In this case, you might use a sample of queries that run on one Monday morning when there is a lot of Business Objects activity, and during one night when some of the bigger SAS jobs are running. Query the Query Patroller tables to get those queries and export the queries to a file. (Consult Query Patroller documentation for detailed instructions.) Use this file as input to the Design Advisor.
Alternative: The Design Advisor has the capability to directly import the entire Query Patroller set of queries. However, this may not be the best approach if there are many queries (for example, over 10,000). One of the targeted approaches described above may be more advisable. Also, the command-line version of this tool (db2advis) has an additional option to directly obtain a workload from QP that includes queries between a start and end timestamp.
Collect and describe the workload to the Design Advisor
With your workload file in hand, you can launch the Design Advisor. As you see in Figure 1, the Design Advisor allows you to select which features you want it to consider, such as indexes, MQT, MDC, and partitioning. This example includes all the options.
Figure 1. Select performance features
Tip: Select all the options. This ensures the Design Advisor finds a globally optimal solution by exploiting synergy among the features.
Alternate approach: Selecting all the options can make it hard to understand why the Design Advisor recommended what it did. Some users may prefer to proceed step-by-step at the possible expense of an optimal solution. The following are some examples of step-by-step approaches:
- Focus on a single application at a time -- its workload and the parts of the schema that it accesses.
- Focus on optimizing access to one table at a time. That is, first identify queries that hit table X and implement only recommendations that pertain to that table. Repeat the process for subsequent tables.
- Focus on a particular feature. For example, focus only on redesigning BigFactTable using MDC and repartitioning. Leave MQTs for a future iteration, perhaps after validating the initial changes in production.
Tip: Consider MQTs with deferred refresh. While this example uses Immediate refresh, deferred refresh MQTs are generally more compatible with ETL activity.
Next, provide the workload to the Design Advisor. In this case, you have a file with workload data obtained from Query Patroller.
Step 2: Decide how you want the Design Advisor to run
There are no special considerations for this step beyond the points discussed in Part 1 of this article series.
Step 3: Review the recommendations
As you see in Figure 2, the Design Advisor generated a set of recommendations that would lead to a 60% improvement in performance. You can also see a partial list of the recommendations, namely, to create 3 MQTs.
Figure 2. Select the recommendations
More of the recommendations are shown in Figure 3. On the CUSTOMER table, the recommendation is to create two additional indexes and keep an existing one. One of the recommendations shown for the LINEITEM table is to change the hash partitioning key.
Figure 3. Design recommendations
Tip: Recall that the workload did not contain any ETL work. In considering these recommendations, keep in mind possible impacts of these recommendations on the performance of ETL work.
New in V8.2, the Design Advisor has detailed reports that let you get a lot more insight into the recommendations. Figure 4 shows which queries benefited most from the recommendations. There are huge variations in improvement across different SQL statements.
Figure 4. Design Advisor reports
Finally, as you see in Figure 5, the Design Advisor identified some objects that were not needed to achieve optimal performance for this workload. These are candidates for dropping. However, before doing so, you need to consider whether the workload you provided to the Design Advisor omitted some statements that might benefit from the objects.
Figure 5. Unused objects
There are no special considerations beyond the points discussed in Part 1 of this article series. To reiterate, it may seem that the natural next step after accepting the recommendations is to make the changes. However, there are alterative things you may wish to do at this point:
- Save the recommendations to show someone else.
- Re-run the Design Advisor with different inputs to see if you can get a better recommendation.
Before implementing recommendations, a DBA might typically:
- Replace the generated names with names that conform to shop conventions.
- Review the tablespaces where the objects are to be stored.
- Add comments to the object to indicate why it was created.
DB2 V8.2 has some complementary enhancements that make it easier to implement these recommendations. These are online index creation and ALTER TABLE support.
Command-line (db2advis) walkthrough
Now weâll walk through the same case study, this time using the command line instead of the GUI.
Here is the command used in this case study:
db2advis âd TPCD âi tpch_queries.in âm IMCP âk LOW -l 700 âc DB2ADVIS_TBSP -f |
Highlights include:
-m IMCP: This specifies that the Design Advisor should consider new indexes (I), new MQTs (M), converting standard tables to MDC tables (C), and repartitioning existing tables (P). The default is indexes only.-k LOW: This specifies to compress the workload to a Low degree. As a result, the Design Advisor will analyze a larger set of the workload you provided. The default is medium.-l 700: This specifies that any new indexes, MQTs, and so on should consume no more than 700 MB. The default is 20% of the total database size.-c DB2ADVIS_TBSP: This specifies a tablespace called DB2ADVIS_TBSP to use as a temporary workspace for generating MQT recommendations. This option is required if you want MQT recommendations and you are running on a DPF (multiple partition) instance. Otherwise, this parameter is optional.
Another useful option (not shown) is -o output_file. This saves the script to create the recommended objects in a file. See the Resources section for links to more comprehensive information on command options.
As the command executes, it describes work in progress, some of which is shown below. At this point, the advisor has generated recommendations for everything except MDCs.
Cost of workload with all recommendations included [1306186] timerons 27 indexes in current solution 3 partitionings in current solution 8 MQTs in current solution |
The recommendation set consists of 27 indexes (which could be current or new ones), 3 partitionings (that is, DPF-related changes like new partitioning keys or tablespaces), and 8 MQTs (new or existing).
Next the Design Advisor analyzes MDCs and it displays the following when it is complete.
3 clustering dimensions in current solution [12305400] timerons (without any recommendations) [1042873] timerons (with current solution) [91.53%] improvement |
The phrase "3 clustering dimensions" means that the Design Advisor recommends 3 MDC dimensions. These could all be on the same table or on different tables, for example, 3 dimensions on Table A, or 1 dimension on Table A and two on Table B. The performance statistics refer to performance for all recommendations, not just the MDC recommendations. The item "timerons (without any recommendations)" refers to the performance with the existing design while "with current solution" refers to estimated performance with all the recommendations implemented.
Next the Design Advisor displays the recommendations in the form of DDL which has been commented out. The recommendations appear in the following order:
- Base tables that include MDC or partitioning recommendations
- MQT recommendations (first new ones, then existing ones to keep, and finally unused ones)
- New clustering indexes (if any)
- Index recommendations (new, keep, unused)
Recommendations for changing one table are shown below:
-- CREATE TABLE "TPCD"."LINEITEM" ("L_ORDERKEY BIGINT NOT NULL,
-- "L_PART" INTEGER NOT NULL,
-- "L_SUPPKEY" INTEGER NOT NULL,
-- "L_LINENUMBER" INTEGER NOT NULL,
-- "L_SHIPINSTRUCT" CHAR(25) NOT NULL,
(11 other columns omitted from this example)
-- MDC409022109290000 GENERATED ALWAYS AS ( ((INT(L_SHIPDATE))/7) )
-- ---- PARTITIONING KEY ("L_PARTKEY") USING HASHING
-- ---- IN "TPCDLADT"
-- ORGANIZE BY (
-- MDC409022109290000,
-- L_SHIPINSTRUCT )
-- PARTITIONING KEY (L_ORDERKEY) USING HASHING
-- IN TPCDLDAT
--;
-- COMMIT WORK ; |
Note that a new partitioning key is recommended (L_ORDERKEY) to replace the current one (L_PARTKEY), which is commented out. The MDC recommendation for this table (ORGANIZE BY clause) includes two dimensions: a generated column (INT(L_SHIPDATE/7) and an existing column (L_SHIPINSTRUCT).
Next in the output are recommendations related to MQTs, as shown below.
-- LIST OF RECOMMENDED MQTs
-- ========================
-- MQT MQT40902204140000 can be created as a refresh immediate MQT
-- mqt[1], 0.009MB
CREATE SUMMARY TABLE "ADVDEMO2"." MQT40902204140000"
AS (SELECT Q6.CO AS "CO", Q6.C1 AS "C1", â¦additional details omitted hereâ¦)
DATA INITIALLY DEFERRED REFRESH IMMEDIATE PARTITIONING KEY (C8)
USING HASHING IN TPCDLDAT ;
COMMIT WORK;
REFRESH TABLE "ADVDEMO2"." MQT40902204140000";
COMMIT WORK;
RUNSTATS ON TABLE "ADVDEMO2"." MQT40902204140000";
COMMIT WORK;
-- MQT MQT409022041530000 can be created as a refresh immediate MQT
(⦠DDL to create this table followsâ¦) |
The MQT recommendations include: estimated size, tablespace to use, partitioning key (if applicable), type of refresh (immediate or deferred), and whether the table is a replica of a base table (indicated by the REPLICATE keyword), which it is not in this case.
Finally, the Design Advisor completes with the information shown below.
8604 solutions were evaluated by the advisor DB2 Workload Performance Advisor tool is finished. |
The Design Advisor is well suited for optimizing database design for data warehouses as it provides advice on MQTs, MDCs, and DPF partitioning keys in addition to indexes. Due to the ad hoc nature of query workloads in data warehouses, the way this tool can analyze that workload and provide recommendations is invaluable.
-
DB2 Information Center: Learn more about the Design Advisor from product documentation in the DB2 Information Center.
-
DB2 Magazine e-book entitled DB2 UDB: The Autonomic Computing Advantage: This resource includes a changer and Flash demo of the Design Advisor.






