Reviewing DB2 for i5/OS query optimizer and database engine feedback mechanisms

DB2 for i5/OS: Feedback is good

Learn how DB2® implements requests using various database feedback mechanisms, and understand how the database engine works. Compare the use of Visual Explain with the use of the database monitors or the SQE Plan Cache, and learn when to use which mechanism. Use DB2 for i5/OS® to ask for and receive feedback on demand. Then use the iSeries™ Navigator to analyze, understand, and tune SQL requests.

Share:

Mike Cain (mcain@us.ibm.com), Senior Technical Staff Member, IBM

Mike Cain is a Senior Technical Staff Member within the IBM® Systems and Technology Group and a team leader for the DB2® for i5/OS® Center of Competency. Prior to his current position, he worked as an IBM AS/400® systems engineer and technical consultant. Mike is located in Rochester, MN and can be reached at mcain@us.ibm.com.



Jackie Jansen (jjansen@ca.ibm.com), Senior Consulting IT Specialist, IBM

Jackie Jansen is a Senior Consulting IT Specialist. She currently works in the IBM Americas Advanced Technical Support Solutions Centre. Jackie is a frequent speaker at iSeries™ technical conferences and user group meetings. Contact Jackie at jjansen@ca.ibm.com.



09 November 2006

Introduction: The importance of feedback

For most people, the database engine and query optimizer comprise a "black box." To get data-centric work accomplished in your applications, you issue structured query language (SQL) requests to tell DB2 what to do, not how to do it. The query optimizer determines the best methods and strategies to implement our SQL request, and the database engine executes the methods and strategies the optimizer provides. But what if you want to understand and analyze these methods and strategies? In other words, how do you peer inside the black box to determine how DB2 implemented your request? This is where database management system feedback comes into play.

When a user initiates an SQL request, the following three phases normally occur:

  • Optimize: In this phase, the query plan is produced (a query plan contains the objects, methods, and strategy to fulfill the user’s request).
  • Open: In this phase, the cursor is opened.
  • Run: In this phase, the database engine runs the query plan.

Each of these phases takes time and resources. To understand a specific query's journey through these phases and, more importantly, to analyze the what and why behind this journey, make use of the various feedback.

Throughout the existence of DB2 for i5/OS (i5/OS and previously OS/400® are operating systems that run on iSeries™ and System i™ computers), a variety of feedback options have been developed and used. Some of the tools are better than others, depending on the application environment, the level of information needed, and the complexity of the query plans. Most of the feedback mechanisms are not enabled by default; they must be switched on to gather and provide information. Some of these mechanisms have little or no overhead, while others can impact application throughput and use significant computing and I/O resources.

Starting with OS/400® Version 5 Release 2, IBM® introduced a new SQL query engine. The use of this new engine will be phased in over several releases. The new SQL query engine is called SQE, and the original classic query engine is called CQE. See Resources for more information about how to download the new SQL query engine.

The following are the feedback mechanisms in DB2 for i5/OS, Version 5 Release 4 (see Figure 1):

  • Autonomic Index Advisor -- New in Version 5 Release 4, the database continuously generates, collects, and groups index advice, which can be accessed database-wide, schema (library)-wide, or for a specific table.
  • SQE Plan Cache and Plan Cache Snapshots -- Version 5 Release 4 offers a new reporting and analysis interface to the system-wide SQE plan cache for queries run using the SQL Query Engine.
  • Print SQL Information -- The database produces textual messages in a spool file or GUI window for embedded, static SQL in high-level language program; extended dynamic SQL from ODBC or JDBC connections; SQL from DRDA connections; or dynamic SQL within a job.
  • Debug Messaging -- The database produces textual messages in a job log.
  • Detailed Database Monitor -- The database produces detailed trace data in a single table.
  • Summary Database Monitor -- The database produces summarized data in multiple tables.
  • Visual Explain -- The database produces graphical and textual information from detailed monitor data or from the SQE Plan Cache.
Figure 1. Query optimizer feedback
fig1

A closer look at the feedback mechanism options

Let's take a closer look at each feedback mechanism.

Autonomic Index Advisor

The Autonomic Index Advisor (Index Advisor) continuously records and summarizes index advice generated system wide. Before Version 5 Release 4, you needed to run Visual Explain or the Database Monitor to identify the recommended indexes, based on an individual query. With Version 5 Release 4, the system automatically collects and aggregates index advice based on all queries. In addition to using Visual Explain and the Database Monitor, you can obtain this information using the Index Advisor facility (see Figure 2), or using the SQL Plan Cache.

Figure 2. The Index Advisor facility
fig2

With centralized collection, you can analyze the indexes to determine the best indexes to create for the biggest benefit across an application or group of queries. Besides prescribing the key columns, the Index Advisor provides information about how many times the index was advised. You can compare the current indexes on a table with the optimizer̢۪s suggested indexes, and then adjust the index strategy for the table using these two pieces of information. In addition to using local selection criteria, the Index Advisor in Version 5 Release 4 considers GROUP BY and JOIN clauses when recommending an index and might recommend an Encoded Vector Index (EVI) or a binary radix index. With the SQL Query Engine, the database is now much closer to recommending the "perfect" index for a query or a group of queries.

From the Index Advisor screen, you can choose to directly create the recommended index (see Figure 3).

Figure 3. Creating an index that the Index Advisor recommends
fig3

SQE Plan Cache and Plan Cache Snapshots

Another exciting enhancement in Version 5 Release 4 is the addition of a graphical interface to the "always on" SQL Plan Cache (see Figure 4). This information significantly improves your understanding of SQL requests from both macro and micro perspectives. You can now identify the most commonly run queries and understand their various plans, all without starting a specific collection tool.

Figure 4. Accessing the SQL Plan Cache
fig4

The query optimizer determines the most efficient way to run query requests and stores this information in an access plan. When using the SQL optimizer, these plans are stored in a single system-wide plan cache.

Users often complain about queries that take much longer than expected. One benefit of the plan cache is the capability to look at these queries that have already run. You no longer need to start a monitor and then ask the user to rerun the problem query. The plan cache contains the actual query plan, as well as the run-time information for the query.

When asking to view the plan cache, the system enables you to enter many different filters that can narrow down the number of SQL statements being presented. You can filter based on runtime, date and time, user, tables queried, and much more. See Figure 5 for the filtering and statement output display. Directly from the SQL Plan Cache statement display, you can select the problem query and run Visual Explain, which is an excellent tool for in-depth analysis of a single query.

To capture and preserve the SQL Plan Cache information, the ability to take a snapshot is available. The snapshot data is placed in a single table and contains a representation of the query. The data in the snapshot is very similar to the detailed database monitor data, which enables you to use similar analysis techniques. The SQL Plan Cache Snapshots can be a very effective way to capture query information on a periodic basis to support comparison and trending.

Figure 5. The SQL Plan Cache filtering and statement output display
fig5

Print SQL Information

Print SQL Information is invoked using the command PRTSQLINF or using the iSeries Navigator graphical interface. The optimizer produces the information from query plan information stored in a high level language program object (*PGM), in an SQL package object (*SQLPKG), or in a given job (*JOB). Information is not provided for query open processing or execution. The messages are placed in a spool file and can be viewed graphically with iSeries Navigator (see Figure 6). The plan information placed in the object might or might not reflect the plan that the database is actually using at run time, given that the Print SQL Information interface shows only the last plan saved. If the package was locked at the time the database created the new plan, the database uses the new plan without saving it back in the package. The command PRTSQLINF is roughly equivalent to the SQL EXPLAIN statement found in other database management systems and adds no additional run-time overhead to the query.

Figure 6. A Print SQL Information message viewed using the iSeries Navigator
fig6

Debug Messaging

You can start and stop Debug Messaging using the following commands:

STRDBG UPDPRD(*YES) 
ENDDBG

You can also start and stop Debug Messaging using specific database connection attributes. If you are analyzing queries using iSeries Navigator Run SQL Scripts, an option is available to put your database host job in debug mode (see Figure 7). When you place the job optimizing the query in debug mode, information and messages that the optimizer produces are placed in the job̢۪s job log during query optimization, during open processing, and during execution. The performance overhead of collecting feedback using debug is minimal because the optimizer is only sending a small set of messages to the job log at optimization time.

Note that the set of available debug messages does not cover all of the methods and strategies available in the current optimizer. For example, as the new SQL query engine is enhanced, the debug messages are not improved. Furthermore, the basic SQE methods are mapped to existing CQE messages, which can result in a misleading or incomplete view of the query plan. The strategic direction for providing optimization feedback is to use the database monitors and Visual Explain.

Figure 7. Database host job in debug mode
fig7

Detailed Database Monitor

Detailed Database Monitor is started and stopped using the commands STRDBMON / ENDDBMON. The iSeries Navigator wizard (see Figure 8) and the database connection attributes also provide interfaces to these commands. You can monitor an individual job or all jobs. When you activate the detailed monitor, data describing the SQL requests and their respective plans is placed in a single table using different row formats, row identifiers, and unique query identifiers. Each SQL request includes some of the following detailed column information:

Table 1.Column information for SQL requests
Column nameDescription
QQRID:Row type ID (for example: 1000, 3000, 3001)
QQUCNT:Query or request identifier by job
QQJOB:Job name
QQUSER:Job user name
QVC102:Current job user name (new with Version 5 Release 3)
QQJNUM:Job number
QQI9:Thread identifier
QQTIME:Time that the row was inserted
QQ1000:Row-specific information, such as the SQL text or index advised

When the same SQL request is optimized and run multiple times, data is captured for each request. For example, you might run the user query SELECT * FROM TABLE1 WHERE COLUMN1 = :HV 26 times, where the host variable :HV is set to values A through Z. The detailed monitor captures the information for all 26 executions, keeping the query plan information for each query execution instance. In effect, you are tracing the SQL requests and flow while the monitor is running, which gives a much larger set of data. In one customer example, running the detailed database monitor on a live, production system for 5 minutes resulted in 6GB of data in the monitor table.

Furthermore, the detailed data is inserted into the table during query optimization and open processing, which could result in additional overhead and work impact. Information is provided for query optimization, open processing, and execution. The detailed database monitor data and the SQE plan cache are the primary input to visually explain queries. See Resources to find more information about using the SQL Performance Monitors in an IBM Redbook™.

Figure 8. The SQL Performance Monitor Wizard
fig8

Before Version 5 Release 4, you could monitor all jobs or a single job. The QAQQINI options offered some minor additional pre-filtering. There were options to exclude SQL requests from i5/OS or to only include SQL requests that were expected to run longer than a specified threshold of time. Version 5 Release 4 and iSeries Navigator provide a new Performance Monitor wizard (see Figure 8). You can use the wizard or the STRDBMON command to greatly enhance pre-filtering capabilities. You can include or exclude SQL requests based on a minimum estimated run time, a specific or generic job or user names, the IP address of the client, or the specific tables being queried (see Figure 8). The filtering options allow for easy subsetting of data that the monitor captures, thereby reducing the overhead of running the database monitor.

In Version 5 Release 4, the database monitor analysis provided within iSeries Navigator is enhanced to allow you to invoke Visual Explain from the results of your filtering and analysis display (see Figure 9). For example, after collecting detailed database monitor data on a set of jobs, you can determine the job taking the most time and, within this job, find the longest running query and visually explain it.

Figure 9. Invoking Visual Explain from the results of your filtering and analysis display
fig9

Summary Database Monitor

You can start and stop the Summary Database Monitor using the iSeries Navigator graphical interface (see Figure 10) or APIs. There is no command interface provided. When the summary monitor is activated, data describing the SQL requests and plans is placed in a series of tables. When the same SQL request is optimized and executed multiple times, the data is summarized resulting in a more compact and less detailed format. For example, you might run the user query SELECT * FROM TABLE1 WHERE COLUMN1 = :HV 26 times, where the host variable :HV is set to values A through Z. The summary monitor summarizes the information for all 26 executions, keeping the query plan information for only the most time consuming query execution instance. The monitor does not report query plan information for the other 25 executions.

The monitor holds the summary data in memory and inserts it into the tables when the monitoring is completed, which results in less overhead and less impact to other work. Information is provided for query optimization, open processing, and execution, but only at a summary level. Note that a query cannot be visually explained from the summarized data.

Figure 10. Invoking the Summary Database Monitor from the iSeries Navigator graphical user interface
fig10

Visual Explain

You can invoke Visual Explain using one of the following methods:

  • iSeries Navigator Run SQL Scripts (see Figure 11)
  • iSeries SQL Performance Monitors list of explainable statements (see Figure 12)
  • iSeries Navigator SQE Plan Cache Show statements (see Figure 5)
Figure 11. Invoking Visual Explain using the Run SQL Scripts command
fig11
Figure 12. Invoking Visual Explain through a list of explainable statements the SQL Performance Monitor collects
fig12

The input to the Visual Explain mechanism is detailed monitor data or query access plan information residing in the SQE Plan Cache. Using the iSeries Navigator JDBC connection to DB2, a query can be dynamically visually explained without running the query, or the query can be run and visually explained. All SQE queries can be visually explained without actually running the query. The more complex CQE queries that contain multiple steps require the run and explain option, such as joining and grouping using hash tables.

Before query optimization, a detailed database monitor is started automatically to collect the needed data. When you close the visual explain window, you can save or discard the detailed database monitor data. By saving the monitor data, the optimizer feedback is stored, and the query can be visually explained again later.

A query represented in existing database monitor data can also be visually explained. Information is provided for query optimization, open processing, and execution. The query environment is also described. All of the factors that affect the query optimizer̢۪s costing process are represented in one interface. Another very useful benefit is being able to identify an SQL statement that is currently running within an active job and then visually explain this query, all without starting a database monitor (see Figure 13). In this case, the input to Visual Explain is the query plan stored in the SQE Plan Cache.

Figure 13. Identifying a currently running SQL statement without starting a database monitor
fig13

Determining which tool and analysis method to use

The specific mechanism to use, and when to use it, depends on your requirements for analysis and the application environment. Generally, the database monitors, SQE Plan Cache, and Visual Explain are the most useful and strategic tools for gathering DB2 for i5/OS feedback. The textual information provided by PRTSQLINF and STRDBG are just not sufficient to describe the more complex query plans that the optimizer produces; especially when you consider the ever-increasing list of methods and strategies. Additionally beneficial: the optimizer can implicitly rewrite your query to gain efficiency and performance.

To compare and contrast the diverse feedback the optimizer provides, let̢۪s take a couple of queries and look at how the feedback varies in both content and presentation.

-- Simple Query
SELECT        COUNT(*) AS NO_ORDERS_JUNE_30
FROM          SALES S
WHERE         S.ORDERDATE = '2006/06/30';

With a simple query, optimizer feedback represented using either debug messages (see Figure 14), print SQL information (Figure 15), or Visual Explain (see Figure 16) is adequate to comprehend the plan. All three mechanisms describe a table scan (arrival sequence) for the table SALES. The debug messages and Visual Explain suggest you create an index (access path) to support the local selection predicate using S.ORDERDATE = '2006/06/30'. To use debug messages to advise the key columns, the second-level message text must be displayed. To use Visual Explain to advise the key columns, start the Index Advisor (see Figure 17). The Visual Explain graph is more accurate, showing the aggregate node to represent the COUNT(*), the number of rows that the table scan will process (600,122 rows), and the number of rows the query returned (1 row). Furthermore, Visual Explain and the underlying database monitor data give us insight into the environmental factors that affect the query optimizer, including memory pool size, degree of parallelism, and logic used in each node for selecting and processing the data.

Figure 14. A debug message showing optimizer feedback
fig14
Figure 15. Optimizer feedback using print SQL information
fig15
Figure 16. Optimizer feedback from Visual Explain
fig16
Figure 17. Invoking the Index Advisor
fig17

The detailed database monitor data for this simple query also contain the detailed feedback from the optimizer, but in text format (see Figure 18).

Figure 18. Detailed database monitor data
fig18

If you look at a more complex query, you can see why Visual Explain and the detailed database monitor are better when it comes to articulating the methods and strategies that the query optimizer chooses.

-- Complex Query
SELECT         C.CUSTOMER,
               S.ORDERDATE,
               SUM(S.QUANTITY) AS TOTAL_QUANTITY
FROM           SALES S,
               CUSTOMERS C
WHERE          S.CUSTKEY = C.CUSTKEY
AND            S.ORDERDATE = ‘2006/06/30’
GROUP BY       C.CUSTOMER,
               S.ORDERDATE
ORDER BY       TOTAL_QUANTITY DESC,
               C.CUSTOMER;

The debug messages (see Figure 19) and PRTSQLINF commands (see Figure 20) give the access method (using the second-level text) for each table and the join order. The debug messages do not shed any light on the grouping or ordering methods. Neither debug messages nor print SQL information adequately describe the join method (probe of a hash table). In this case, only the Visual Explain graph (see Figure 21) accurately shows the entire query plan, including an index recommendation that includes the local selection predicate and join predicate (see Figure 22). A Visual Explain graph is the incarnation of the phrase "a picture is worth a thousand words."

Figure 19. Debug messages showing the access method for each table
fig19
Figure 20. Results of the PRTSQLINF command showing access method for each table
fig20

The detailed database monitor data (see Figure 23) for this complex query also contain the detailed feedback from the optimizer, but in textual form. Because the query plan is more complex, more data is required to describe the methods and strategies.

Figure 21. A Visual Explain graph showing the query plan
fig21
Figure 22. The Index Advisor showing an index recommendation
fig22
Figure 23. Database monitor data showing optimizer feedback in textual form
fig23

More advice, and it's automatic

Given that indexes can help the query optimizer and database engine to achieve better performance, specific advice is available. As highlighted earlier, the optimization process provides index-advised information. The quantity and quality of this advice depends on the query engine (CQE or SQE) and the feedback mechanism. In general, SQE can provide better index advice than CQE. CQE provides a suggested index for the local selection predicates (that is, where COLOR = 'BLUE') when no index exists and a table scan is used. If CQE creates a temporary index as part of the query plan, the engine provides information on the key columns of the temporary index.

Messages produced from debug can include index advised information. You can find the details on which key columns to use in the second-level text of the message. Information that the command PRTSQLINF provides does not include index creation advice. The database monitor does provide index creation advice as part of its data set.

A perfect index would include any local selection predicates, join predicates, and possibly any grouping and ordering columns. While the CQE optimizer does not provide advice on creating the "perfect" index for the query, Visual Explain attempts to weave together different pieces of information to a give a better recommendation. SQE, on the other hand, tries to provide advice on the perfect index for the query.

Look at a simple example to understand the difference in index advised feedback.

SELECT         C.CUSTOMER,
               S.ORDERDATE
FROM           SALES S,
               CUSTOMERS C
WHERE          S.CUSTKEY = C.CUSTKEY          <-- join predicate
AND            S.ORDERDATE = '2006/06/30';    <-- local selection predicate

Based on this query, the perfect index for the SALES table has key columns ORDERDATE and CUSTKEY. This supports the local selection (S.ORDERDATE = '2006/06/30') and the join (S.CUSTKEY = C.CUSTKEY).

The perfect index for the CUSTOMER table has key column CUSTKEY, which supports the join (S.CUSTKEY = C.CUSTKEY).

The debug messages and CQE database monitor data advise an index for the SALES table with key column ORDERDATE; omitting the join column. No index is advised for the CUSTOMER table. On the other hand, Visual Explain and the SQE database monitor data advise an index for both the SALES table and the CUSTOMER table, specifying the local selection and join columns.

Given the lack of information on the column data and the availability of various methods for access, SQE can even recommend more than one index for a given table. For example, SQE can recommend one index for local selection predicates and another index for order by columns. Therefore, the optimizer has many choices for data access: table scan, index probe, or index scan.

If you decide to take the advice of the optimizer and create the recommended indexes, will your queries run faster? Well, as usual, it depends. By creating the recommended indexes, you provide the optimizer with more information on the actual data in the tables, and the relationship of that data to other data. This new information can result in the optimizer arriving at a different strategy, and the new strategy might not require the indexes previously recommended. At this point you might be fed up with feedback, but don't give up. These newly created indexes are beneficial.

One of the main sources for accurate statistics for the optimizer is existing indexes. Before Version 5 Release 3 of i5/OS and DB2 for i5/OS, it was impossible to determine exactly how the optimizer was using indexes. For example, if an index was used for statistics but not implementation, no feedback was available to indicate this behavior. Now this information is available using iSeries Navigator (see Figure 24).

Figure 24. Optimizer feedback shown in the iSeries Navigator
fig24

See Resources to access a white paper about creating and using indexes.

For a given database table, feedback on indexes, their attributes, and their usage is provided in the form of a report. This report can help you understand which indexes are available, as well as their usage pattern. You see information about when the index was last used for statistics and implementation, and how many times the index was used (see Figure 25). Scroll to the right of the report to see this information.

Figure 25. Report showing when index was last used and how many times it was used
fig25

Use the best tool for the job

Depending on the scope of the analysis, some feedback mechanisms are more suitable than others. For example, the database monitor is well suited to collect information over a period of time, over one or more database connections, or for a specific user. Use the database monitor "to cast a narrow or wide net." Furthermore, all of the information collected is in one place. Interrogating the SQE Plan Cache helps you quickly find a recently run query without having to start a monitor and rerun the query. Although the SQE Plan Cache does not exist across IPLs, you can save the required statements as a snapshot for future analysis. As the Plan Cache fills, older statements are removed to make room for more recent queries.

Turning on debug messages for the same period of time or set of users results in information scattered across many job logs, and this information is not as complete as the data represented in the database monitor tables. A good example of this is that the database monitors collect a user's SQL statements, while debug messages does not. Also consider that analyzing the query plan information in the form of several operating system messages can be cumbersome.

The print SQL information tool can provide some insight into which SQL statements are run in a program or package. The program or SQL package object might not contain the actual plans that will be run, which can make relying on the explain information problematic. And, the information about the query plan is textual and not robust enough to adequately explain the more sophisticated strategies.

Comparing the use of Visual Explain with the use of the database monitors or the SQE Plan Cache gives another perspective. Visual Explain represents a single query's plan, which is not very useful when analyzing a set of SQL requests or analyzing the flow of SQL requests. You can use the database monitor to analyze SQL performance over time, a set of users, or a specific SQL request. When you use these tools together, Visual Explain can represent the plan after the query is identified within the database monitor data.

In our experience the best analysis process includes strategically capturing database monitor data and aggressively using Visual Explain. Analyzing the monitor data can provide a broad view of the SQL execution environment, as well as very detailed information on individual queries.


Conclusion

One of the real values of DB2 for i5/OS is the capability for you to ask for and receive feedback on demand. This information can help you to analyze, understand, and tune SQL requests. And the best delivery mechanism for this information is iSeries Navigator. So, sit back, relax, and accept some feedback.

Resources

Learn

Get products and technologies

  • DB2 for i5/OS Web page: Download the new SQL query engine.
  • iSeries Access Web page: Install the latest iSeries Navigator version and familiarize yourself with its interface and tools, because this is the strategic interface to DB2 for i5/OS.
  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, IBM i
ArticleID=173385
ArticleTitle=Reviewing DB2 for i5/OS query optimizer and database engine feedback mechanisms
publish-date=11092006