Comparing real-time cardinality to the optimizer cardinality estimates

A tool to aid in tuning queries

Evaluating real time cardinality in IBM® DB2® Universal Database ™ (DB2 UDB) versus the estimated cardinality in the access plan can help DBAs make various decisions to improve query performance. Here are all the details to create count queries to evaluate real-time cardinalities at certain operators in an access plan.

Samir Kapoor, DB2 UDB Advanced Support Analyst, IBM

Samir Kapoor photoSamir Kapoor is an IBM Certified Advanced Technical Expert for DB2. Samir currently works with the DB2 UDB Advanced Support -- Down system division (DSD) team and has in-depth knowledge in the engine area.



Sudharsan Selvarajah (sids@ca.ibm.com), DB2 UDB Team Lead -- Advanced Support Analyst, IBM

Sudharsan Selvarajah photoSudharsan Selvarajah (Sid) is an IBM Certified Advanced Technical Expert for DB2 UDB. Sid currently is a team lead with the DB2 UDB Advanced Support -- Down system division (DSD) team and has in-depth knowledge in the optimizer and engine areas.



Kaarel Truuvert, DB2 UDB Query Optimization Developer, IBM

Kaarel Truuvert photoKaarel Truuvert is a senior developer in the DB2 UDB Optimizer group. He has extensive experience in the diagnosis and resolution of customer problems.



15 December 2005

This article applies to DB2 UDB on Linux®, UNIX®, and Windows®.

Introduction

It is important for a DBA to understand how to look at EXPLAIN output to detect and locate problems in the access plan. Sometimes customers report a performance problem where a specific query is not performing optimally. There could be numerous possible reasons why this may be happening. From the database perspective, the statistics might be outdated, or the bottleneck could be somewhere else such as locking, bufferpool contention, sorting, and so on. From the operating system or hardware perspective, it could be a CPU bottleneck, I/O, or network-related. This article assumes that you have narrowed the problem down to just the access ploan, having exhausted all other possibilities such as operating system, hardware, or other performance areas in DB2.

This article will show you how to collect the real counts at specific operators in the access plan (real-time cardinality) so that you can compare them to the estimated cardinality calculated by the DB2 optimizer. These differences can exist due to statistical correlation and data distribution, among other reasons. This article describes a technique that a DBA can use to identify areas where their environment is vulnerable. Working in the support area, we usually get customers to collect count queries at specific operators to understand where the bottleneck is in the access plan, whether the optimizer's estimate is accurate, and if not, how it could be improved.

The optimizer calculates its cardinality estimate -- that is, the estimated number of rows returned by the query -- based on the statistics available. It is important to keep the statistics up to date, and to collect as much information about the tables and statistics as possible.

The examples in this article show how to determine the real-time number of rows returned at certain operators, to help you focus your investigation. Note that you shouldn't restrict the scope of the investigation to plan analysis, unless it is clear that the chosen access path is sub-optimal.


Viewing the access plan

There are three methods of viewing an access plan. This article will focus on using the db2exfmt tool. Before you can use this tool, you must create the EXPLAIN tables for the database.

To better understand access plans, you need to understand their basic components:

  • Cardinality
  • <PLAN OPERATOR>
  • (PLAN OPERATOR ID)
  • CPU Cost
  • I/O Cost

Let's examine what each of these components means in the context of an access plan:

  • Cardinality -- The total number of qualifying rows estimated by the optimizer after filtering operations, such as predicate application or aggregation, are applied.
  • PLAN OPERATOR -- Descriptive label for the type of operator, for example NLJOIN, HSJOIN, TBSCAN, IXSCAN, and so on.
  • PLAN OPERATOR ID -- Unique ID for the plan operator within this query.
  • CPU Cost -- Estimated cumulative total cost (in timerons) of executing the chosen access plan up to and including this plan operator.
  • I/O Cost -- Estimated cumulative I/O cost (in data page I/Os) of executing the chosen access plan up to and including this operator.

Listing 1, an example taken from the output of db2exfmt, shows how each of these components appears:

Listing 1. How access plan components appear in db2exfmt output
16.64    <-	(a) Cardinality
IXSCAN   <-	(b) PLAN OPERATOR
(   2)   <-	(c) PLAN OPERATOR ID
0.117254 <-	(d) CPU Cost
 0.02    <-	(e) I/O Cost
   |

When examining an access plan, you may find that the estimated cardinality is not as accurate as it should be. You may want to evaluate what the real cardinality is versus the estimated cardinality in the access plan. For example:

Listing 2. Example access plan segment
		4.85057
		 FETCH
		(   9)
		15.6981
		 3.2039
	   /--------+--------\
        4.85057	   	  422
        IXSCAN	      TABLE: SKAPOOR
        (  10)	           TEST1
        0.234466
           0
	  |
          422
       INDEX:SKAPOOR
	TESTIND

The plan segment from the access plan in Listing 2 indicates that the IXSCAN operation will return an estimated number of 4.85057 (approximately 5) rows. However, if you suspect that the result should be less or more than 5 rows, you can determine and compare the "real cardinality" with the "estimated cardinality" in the access plan. In order to do this, you need to look at the details for the PLAN OPERATOR, identified by the PLAN OPERATOR ID. In the above plan segment, the OPERATOR ID for IXSCAN is (10). Go to this specific operator ID in the detailed operator section of the access plan, and isolate the predicates applied at the operator (in this case, IXSCAN). For example:

Listing 3. PLAN OPERATOR details
10) IXSCAN: (Index Scan)
		Cumulative Total Cost: 		0.234466
		Cumulative CPU Cost: 		129660
		Cumulative I/O Cost: 		0
		Cumulative Re-Total Cost:	 	0.170341
		Cumulative Re-CPU Cost:	 	94198.8
		Cumulative Re-I/O Cost: 		0
		Cumulative First Row Cost: 		0.21937
		Estimated Bufferpool Buffers: 	1

		Arguments:
		---------
		MAXPAGES: (Maximum pages for prefetch)
			ALL
		PREFETCH: (Type of Prefetch)
			NONE
		ROWLOCK : (Row Lock intent)
			NEXT KEY SHARE
		SCANDIR : (Scan Direction)
			FORWARD
		TABLOCK : (Table Lock intent)
			INTENT SHARE

		Predicates:
		----------
		30) Start Key Predicate
			Relational Operator: 	Equal (=)
			Subquery Input Required: 	No
			Filter Factor: 		0.0114943

			Predicate Text:
			--------------
			(Q2.NAME = 'PAR') 

		30) Stop Key Predicate
			Relational Operator: 	Equal (=)
			Subquery Input Required: 	No
			Filter Factor: 		0.0114943

			Predicate Text:
			--------------
			(Q2.NAME = 'PAR')

Taking the predicates applied at IXSCAN(10), you can create an SQL statement to get the real-time count for the predicates applied at this operator:

Listing 4. SQL statement to get real-time count
select count(*) from skapoor.test1 Q2 where Q2.NAME = 'PAR';

Q2 is the rewritten table name by the query rewrite. (The Examples section will explore this in more detail.)

If for any reason you find that the caridnality of the above query is significantly different from the estimated cardinality, the statistics might be out of date, or even absent, and a RUNSTATS may be necessary. Or, the difference could simply be due to other reasons, such as not enough distribution statistics collected by RUNSTATS. (In order to increase the distribution statistics, you don't necessarily need to change the database configuration (num_freqvals, num_quantiles). You can specify the number of frequent values and quantiles on the RUNSTATS command.)


Guidelines

When gathering count(*) queries, the first step is to determine how many counts you need for a given SQL statement. Here are some guidelines that you can use when gathering counts.

  1. Gather counts for base accesses (an index scan or a table scan of a table) that are not dependent on a join. Though note also that just as it's useful to get separate counts for the individual local predicates, it may sometimes also be useful to get a count of just the join predicate, separate from the local predicates, as that can help identify possible skew at the join.
  2. Give lower priority to operators that do not change the cardinality of the plan. Operators whose cardinality in the plan remains the same can be skipped in the interim. However, there are cases where the estimated cardinality between two adjacent operators remains the same, but the actual cardinality between the two very same operators varies, due to a duplicate eliminating or aggregating SORT (from a DISTINCT, GROUP BY). This can happen when the optimizer does not have the necessary information at planning time. If you run into this exception, you can gather the counts for this.
  3. Skip operators with redundant predicates. If the same predicate appears twice in two different operators, it is acceptable to skip getting a count for both.
  4. Generate queries using post-order traversal. This implies gathering counts(*) on any input trees of an operator from left to right, and then the root which is the operator itself. This is a key step, since following this order will allow you to add predicates for each successive count(*) query you create.

Special cases

  • You can gather counts aside from the order followed in the access plan (for example, different join order combinations, individual predicates by themselves). For example, in debugging problems where you suspect the join order is not optimal, it can be useful to gather counts for different join combinations. This would be a way to prove if a different join combination would have filtered more rows earlier on. Note, since this method can result in several combinations and hence several count(*) queries, this is better left to a deeper stage in the investigation, unless you have already determined join order to be the cause of the problem. This step requires a minimum of 3 tables to be joined (where the 3 tables can include self-joins).
  • Also, in certain cases where the number of rows returned is quite large, a "select count(*)" can return an arithmetic overflow. For these cases, you can use "select count_big(*)" instead.
  • In the case of queries on MDC tables, it is important to note that if you have a FETCH-IXSCAN combination in a plan where the index chosen is an MDC BLOCK index, it is common to see the cardinality estimate at the FETCH higher than that of the one at the IXSCAN, since the index scan's cardinality estimate refers to blocks of records as opposed to individual records.

Examples

Example 1

Environment: Windows DB2 UDB ESE V8.2 fixpak 8 Single partition.

Database and database manager configuration from the access plan:

Note: Please refer to the article "Recreate optimizer access plans using db2look" (developerWorks 2005) for help in understanding how to set set the values appropriately.

Listing 5. Example 1
Database Context:
----------------
	Parallelism: 		None
	CPU Speed: 		9.446886e-007
	Comm Speed: 		0
	Buffer Pool size: 	250
	Sort Heap size: 	256
	Database Heap size: 	600
	Lock List size: 	50
	Maximum Lock List: 	22
	Average Applications: 	1
	Locks Available: 	1122

Package Context:
---------------
	SQL Type: 		Dynamic
	Optimization Level: 	5
	Blocking: 		Block All Cursors
	Isolation Level: 		Cursor Stability



		Arguments:
		---------
		BLDLEVEL: (Build level)
			DB2 v8.1.8.852 : special_13527
		ENVVAR  : (Environment Variable)
			DB2_HASH_JOIN = no    
		ENVVAR  : (Environment Variable)
			DB2_INLIST_TO_NLJN = yes
		ENVVAR  : (Environment Variable)
			DB2_LIKE_VARCHAR = 6.2
		STMTHEAP: (Statement heap size)
			2048

Note: HASH JOIN is turned OFF in this example at the DB2 registry level (DB2_HASH_JOIN=NO).

First, create the sample database, if necessary, using the db2sampl command.

Then issue the following query from a DB2 command line:

Listing 6. Select from sample database
db2 select name, salary from staff s, org o where
s.dept=o.deptnumb and s.name='Sanders' and s.salary > 10000.00 

NAME      SALARY
--------- ---------
Sanders    18357.50

  1 record(s) selected.

The result shows that the query returns one row.

Let's say that this query is performing poorly, and you are interested in examining the access plan. Generate the access plan and see what the optimizer estimates the total cardinality (Number of rows) returned by the query will be.

Before you do this, you must create the EXPLAIN tables:

You can create the EXPLAIN tables by running the EXPLAIN.DDL in the SQLLIB\MISC directory, which is under the directory where you installed DB2.

Listing 7. Create the EXPLAIN tables
C:\>db2 connect to sample
C:\>cd <install directory>\SQLLIB\MISC
C:\><install directory>\SQLLIB\MISC\> db2 -tvf EXPLAIN.DDL

Once the EXPLAIN tables are created, you are ready to run the following query in EXPLAIN mode:

Listing 8. Run the query in EXPLAIN mode
C:\>db2 set current explain mode explain
C:\>db2 select name,salary  from staff s, org o where s.dept=o.deptnumb and
s.name='Sanders' and s.salary > 10000.00 
C:\>db2 set current explain mode no

Now, gather the EXPLAIN output:

Listing 9. Format the EXPLAIN output
C:\>db2 connect reset
C:\>db2exfmt -d <dbname> -g TIC -w -1 -n % -s % -# 0 -o exfmt.txt

Examine the exfmt output:

Listing 10. exfmt output
<snippet>

Original Statement:
------------------
select name,salary 
from staff s, org o 
where s.dept=o.deptnumb and s.name='Sanders' and s.salary > 10000.00 

Optimized Statement:
-------------------
SELECT Q2.NAME AS "NAME", Q2.SALARY AS "SALARY" 
FROM SKAPOOR.ORG AS Q1, SKAPOOR.STAFF AS Q2 
WHERE (+10000.00 < Q2.SALARY) AND (Q2.NAME = 'Sanders') AND (Q2.DEPT = Q1.DEPTNUMB) 

Access Plan:
-----------
	Total Cost: 		26.0872
	Query Degree:		1

              Rows 
             RETURN 
             (   1) 
              Cost 
               I/O 
               |
             2.176  		<- Total Estimated Number of rows.
             NLJOIN 
             (   2) 
             26.0872 
                2 
          /-----+-----\
   0.906667         2.4		<- Estimated Number of rows
     TBSCAN           TBSCAN 
     (   3)           (   4) 
     13.0678          13.0194 
        1                1 
        |                |
       68               60 
 TABLE: SKAPOOR   TABLE: SKAPOOR  
      STAFF             ORG

The total number of rows, estimated by the optimizer, is 2.176. This is close, but higher than the real cardinality.

Let us gather the counts based on the guidelines we discussed earlier.

  • Based on guideline (A), TBSCAN(3) is a candidate.
  • There are no cases satisfying guideline (B).
  • Based on guideline (C), you can ignore a separate count query for TBSCAN(4) as the join predicate being resolved is the same as in NLJOIN(2).
  • Based on guideline (D), we have the following order:

Start from the leftmost deepest operator, in this case, the OUTER of NLJOIN(2) which is TBSCAN(2), and the INNER which is TBSCAN(4), and then the join predicate itself which is NLJOIN(2). In this case, look at the details of (3) first and then (4) and then (2).

Listing 11. TBSCAN (3)
    0.906667            
     TBSCAN           
     (   3)        
       1                
       |                       
      68               
 TABLE: SKAPOOR   
      STAFF

Go to the details of operator (3):

Listing 12. Details of operator (3)
	3) TBSCAN: (Table Scan)
		Cumulative Total Cost: 		13.0678
		Cumulative CPU Cost: 		230558
		Cumulative I/O Cost: 		1
		Cumulative Re-Total Cost: 		0.172845
		Cumulative Re-CPU Cost: 		182965
		Cumulative Re-I/O Cost: 		0
		Cumulative First Row Cost: 		13.0653
		Estimated Bufferpool Buffers: 	1

		Arguments:
		---------
		JN INPUT: (Join input leg)
			OUTER
		MAXPAGES: (Maximum pages for prefetch)
			ALL
		PREFETCH: (Type of Prefetch)
			NONE
		ROWLOCK : (Row Lock intent)
			NEXT KEY SHARE
		SCANDIR : (Scan Direction)
			FORWARD
		TABLOCK : (Table Lock intent)
			INTENT SHARE
		TBISOLVL: (Table access Isolation Level)
			CURSOR STABILITY

		Predicates:
		----------
		2) Sargable Predicate
			Relational Operator: 	Less Than (<)
			Subquery Input Required: 	No
			Filter Factor: 		0.333333

			Predicate Text:
			--------------
			(+10000.00 < Q2.SALARY) 

		3) Sargable Predicate
			Relational Operator: 	Equal (=)
			Subquery Input Required: 	No
			Filter Factor: 		0.04

			Predicate Text:
			--------------
			(Q2.NAME = 'Sanders')

Let's focus on the predicates here. Here are the two predicates resolved at operator (3):

Listing 13. Predicates
- (+10000.00 < Q2.SALARY)
- (Q2.NAME = 'Sanders')

You will notice the table name is Q2. This is due to the query rewrite renaming the tables. Look at the optimized statement:

Listing 14. Optimized statement
Optimized Statement:
-------------------
SELECT Q2.NAME AS "NAME", Q2.SALARY AS "SALARY" 
FROM SKAPOOR.ORG AS Q1, SKAPOOR.STAFF AS Q2 
WHERE (+10000.00 < Q2.SALARY) AND (Q2.NAME = 'Sanders') AND (Q2.DEPT = Q1.DEPTNUMB)

The two tables involved in the query have been renamed by DB2 to:

Listing 15. Renamed tables
SKAPOOR.ORG AS Q1
SKAPOOR.STAFF AS Q2

For the two local predicates, to see what the real count is, generate a count query:

Listing 16. Query count
SELECT
	COUNT(*)
FROM
	SKAPOOR.STAFF AS Q2
WHERE
	(+10000.00 < Q2.SALARY) AND
	(Q2.NAME = 'Sanders');

If you run this query, the result should show:

Listing 17. Query results
1
-----------
          1

  1 record(s) selected.

The estimated selectivity from the access plan is 0.906667. If you want to further see if the filter factor estimates at TBSCAN(3) make sense, you should generate a count for each predicate individually. For example:

Listing 18. Generate filter factor for each predicate
SELECT 
	COUNT(*)
FROM
	SKAPOOR.STAFF AS Q2
WHERE
	(+10000.00 < Q2.SALARY);

SELECT 
	COUNT(*)
FROM
	SKAPOOR.STAFF AS Q2
WHERE
	(Q2.NAME = 'Sanders');

This would show you the individual filtering for each predicate that is being resolved at TBSCAN(3). You can see the filter factor estimates above for each local predicate, for example:

Listing 19. Filter factor for each local predicate
Predicates:
----------
2) Sargable Predicate
	Relational Operator: 	Less Than (<)
	Subquery Input Required: 	No
	Filter Factor: 		0.333333

	Predicate Text:
	--------------
	(+10000.00 < Q2.SALARY) 

3) Sargable Predicate
	Relational Operator: 	Equal (=)
	Subquery Input Required: 	No
	Filter Factor: 		0.04

	Predicate Text:
	--------------
	(Q2.NAME = 'Sanders')

The Filter factor for each predicate indicates the proportion of data the optimizer estimates will satisfy the predicate.

Now, moving on to the INNER of NLJOIN(2):

TBSCAN(4) -- Since this is the same join predicate being resolved at NLJOIN(2), and there are no additional local predicates, we can ignore this operator.

Let's move on to generating the count at NLJOIN (2).

Listing 20. NLJOIN (2)
              2.176 	<- Total Estimated Number of rows.
             NLJOIN 
             (   2) 
             26.0872

Let's look at the predicates being evaluated at NLJOIN (2):

Listing 21. Predicates being evaluated at NLJOIN (2)
4) Predicate used in Join
	Relational Operator:	Equal (=)
	Subquery Input Required: 	No
	Filter Factor: 		0.125

	Predicate Text:
	--------------
	(Q2.DEPT = Q1.DEPTNUMB)

Since the OUTER of this NLJOIN, TBSCAN(2), is applying local predicates, you must also add the predicates to the count query. The predicates are cumulative as you go up the graph.

Figure 22. Add the predicates to the count query
SELECT 
	COUNT(*)
FROM
	SKAPOOR.ORG AS Q1,
	SKAPOOR.STAFF AS Q2
WHERE
	(Q2.DEPT = Q1.DEPTNUMB) AND 
	(+10000.00 < Q2.SALARY) AND
	(Q2.NAME = 'Sanders');

This would return 1 row, and the estimate is 2.176.

This covers all the operators in this access plan. In this particular case, if you see the CARDINALITY estimates of the two tables in the access plan, they are 68 and 60 for the STAFF and ORG tables respectively (Refer to the exfmt output in Listing 10). However, if you do a count to see how many records exist in each of these tables, you will observe that this is not correct:

Listing 23. Record count
C:\>db2 "select count(*) from staff"

1
-----------
         35

  1 record(s) selected.


C:\>db2 "select count(*) from org"

1
-----------
          8

1 record(s) selected.

If you go to the "Objects used in Access Plan" section, at the bottom of the plan, you can see which tables and indexes were involved in the access plan and what their statistics are. For example:

Listing 24. Statistics for objects in access plan
Objects Used in Access Plan:
---------------------------

	Schema: SKAPOOR 
	Name: 	ORG
	Type: 	Table
			Time of creation: 			2005-09-24-12.53.47.951004
			Last statistics update:
			Number of columns: 		5
			Number of rows: 			60
			Width of rows: 			14
			Number of buffer pool pages: 	1
			Distinct row values: 		No
			Tablespace name: 			USERSPACE1        
			Tablespace overhead: 		12.670000
			Tablespace transfer rate: 		0.180000
			Source for statistics: 		Single Node
			Prefetch page count: 		32
			Container extent page count: 	32
			Table overflow record count: 	0
			Table Active Blocks: 		-1

In this case, the "Last statistics update" column is empty. Which means RUNSTATS was never executed on this table. It shows empty for the STAFF table as well. So in this case, you should collect STATISTICS in order for the optimizer to have a better understanding of what the data is like. Collect distribution statistics and then generate the access plan again.

Connect to the sample database and issue RUNSTATS against the sales and org tables as follows:

Listing 25. RUNSTATS against sales and org tables
C:\> db2 runstats on table <schema>.org with distribution and indexes all
C:\> db2 runstats on table <schema>.staff with distribution and indexes all

Where <schema> is the schema that was used to create the sample database.

Generate the access plan again for the query. It should look similar to this:

Listing 26. Generate access plan
Access Plan:
-----------
	Total Cost: 	25.9071
	Query Degree:	1

              Rows 
             RETURN 
             (   1) 
              Cost 
              I/O 
               |
            0.971429 
             NLJOIN 
             (   2) 
             25.9071 
                2 
          /-----+------\
    0.971429            1 
     TBSCAN           TBSCAN 
     (   3)           (   4) 
     12.9941          12.913 
        1                1 
        |                |
       35                8 
 TABLE: SKAPOOR   TABLE: SKAPOOR  
      STAFF             ORG

Now you can see the overall estimate is much closer (0.971429, for an actual value of 1).

This example uses the same environment, database, and database manager configuration settings as Example 1 above.

Run RUNSTATS on the following three tables from a command line, as follows:

Listing 27. RUNSTATS on sales, employee, and department tables
C:\> db2 connect to sample
C:\> db2 runstats on table <schema>.sales with distribution and indexes all
C:\> db2 runstats on table <schema>.employee with distribution and indexes all
C:\> db2 runstats on table <schema>.department with distribution and indexes all

Follow the same steps as in Example 1 to generate the access plan for the query below:

Listing 28. Query
select
	a.sales_person, 
	c.deptname, 
	c.deptno,
	b.sex,
	count(*) as count
from
	sales a, 
	employee b, 
	department c
where
	a.SALES_PERSON=b.LASTNAME AND
	b.workdept = c.deptno and
	c.deptno in ('E21', 'A00') and
	b.sex='M' and
	a.sales=1
group by 
	a.sales_person, 
	c.deptname, 
	c.deptno, b.sex;

In the EXPLAIN output, you will see something similar to the following (note that DB2_HASH_JOIN is disabled in this example):

Listing 29. Explain output for Example 2
Original Statement:
------------------
select a.sales_person, c.deptname, c.deptno, b.sex, count(*) as count 
from sales a, employee b, department c 
where a.SALES_PERSON=b.LASTNAME AND b.workdept = c.deptno and c.deptno in 
        ('E21', 'A00') and b.sex='M' and a.sales=1 
group by a.sales_person, c.deptname, c.deptno, b.sex 


Optimized Statement:
-------------------
SELECT Q5.$C0 AS "SALES_PERSON", Q5.$C1 AS "DEPTNAME", Q5.$C2 AS "DEPTNO", 
        'M' AS "SEX", Q5.$C3 AS "COUNT" 
FROM 
   (SELECT Q4.$C0, Q4.$C1, Q4.$C2, COUNT(* ) 
   FROM 
      (SELECT Q3.SALES_PERSON, Q1.DEPTNAME, Q1.DEPTNO 
      FROM SKAPOOR.DEPARTMENT AS Q1, SKAPOOR.EMPLOYEE AS Q2, SKAPOOR.SALES AS Q3 
      WHERE (Q3.SALES = 1) AND (Q2.SEX = 'M') AND (Q2.WORKDEPT = Q1.DEPTNO) 
      AND (Q3.SALES_PERSON = Q2.LASTNAME) AND Q1.DEPTNO IN ('E21', 
           'A00')) AS Q4 
   GROUP BY Q4.$C2, Q4.$C1, Q4.$C0) AS Q5 

Access Plan:
-----------
	Total Cost: 		51.7706
	Query Degree:		1

                      Rows 
                     RETURN 
                     (   1) 
                      Cost 
                       I/O 
                       |
                     1.60887 
                     GRPBY  
                     (   2) 
                     51.7695 
                        4 
                       |
                     1.60887 
                     MSJOIN 
                     (   3) 
                     51.7689 
                        4 
                /-------+------\
           4.15625            0.387097 
           TBSCAN              FILTER 
           (   4)              (  14) 
           38.7748             12.9865 
              3                   1 
              |                   |
           4.15625               12 
           SORT                TBSCAN 
           (   5)              (  15) 
           38.7724             12.9865 
              3                   1 
              |                   |
           4.15625               12 
           MSJOIN              SORT   
           (   6)              (  16) 
           38.7659             12.9852 
              3                  1 
          /---+---\              |
        2         2.07813        12 
     TBSCAN       FILTER       TBSCAN 
     (   7)       (  10)       (  17) 
     12.9188      25.8441      12.9805 
        1            2            1 
        |            |            |
        2         4.15625        41 
     SORT         TBSCAN   TABLE: SKAPOOR  
     (   8)       (  11)        SALES 
     12.9176      25.8441 
        1            2 
        |            |
        2         4.15625 
     TBSCAN       SORT   
     (   9)       (  12) 
     12.915       25.8429 
        1            2 
        |            |
        9         4.15625 
 TABLE: SKAPOOR   TBSCAN 
   DEPARTMENT     (  13) 
                  25.8401 
                     2 
                     |
                    32 
              TABLE: SKAPOOR  
                 EMPLOYEE

Let's gather the counts at each operation. Go through the basic guidelines discussed earlier.

  • Based on guideline (A), TBSCAN(9), TBSCAN(13), TBSCAN(17) are all base-level accesses.
  • Based on guideline (B), we give lower priority to getting counts at SORT(8), SORT(12) and SORT(16), since the cardinality does not change from base-level scan at this operator and it is not a distincting SORT, this can be ignored. TBSCAN(7), (11), and (15) can also be ignored. If you suspect the SORT does play a role, you can revisit it later.
  • Based on guideline (C), the join predicate at FILTER(10) and FILTER(14) are the same join predicates, being resolved as MSJOIN(6) and MSJOIN(3). So these two, FILTER(10) and FILTER(14), can be ignored.
  • Lastly, incorporating guideline (D), we get the following sequence:
Listing 30. Incorporating guideline D
TBSCAN (9),
TBSCAN (13), 
MSJOIN(6), 
TBSCAN(17), 
MSJOIN(3).

Start from leftmost tree, and work your way up. Follow the same steps as in Example 1.

Here are the renamed tables in the optimized statement.

Listing 31. Renamed tables
SKAPOOR.DEPARTMENT AS Q1, 
SKAPOOR.EMPLOYEE AS Q2, 
SKAPOOR.SALES AS Q3
Listing 32. TBSCAN(9)
Predicates:
----------
10) Sargable Predicate
	Relational Operator: 	In List (IN)
	Subquery Input Required: 	No
	Filter Factor: 		0.222222

	Predicate Text:
	--------------
	Q1.DEPTNO IN ('E21', 'A00')

The count query would look like:

Listing 33. Count query
SELECT COUNT(*)
	FROM
		SKAPOOR.DEPARTMENT AS Q1
	WHERE 
		Q1.DEPTNO IN ('E21', 'A00');

At OPERATORS (8) and (7), you are not applying any local or join predicates. The same cardinality estimate is carried up, and no counts are needed for these two operators.

Listing 33. TBSCAN(13)
Predicates:
----------
6) Sargable Predicate
	Relational Operator: 	Equal (=)
	Subquery Input Required: 	No
	Filter Factor: 		0.59375

	Predicate Text:
	--------------
	(Q2.SEX = 'M') 

9) Sargable Predicate
	Relational Operator: 	In List (IN)
	Subquery Input Required: 	No
	Filter Factor: 		0.21875

	Predicate Text:
	--------------
	Q2.WORKDEPT IN ('E21', 'A00')

There are two local predicates. First, let's check the estimate of each predicate individually.

Listing 34. Estimate of each predicate
SELECT COUNT(*)
	FROM
		SKAPOOR.EMPLOYEE AS Q2
	WHERE
		(Q2.SEX = 'M');

Notice that 19 records are returned. If you do a select count(*) from employee, there are 32 records in the table. Hence, the filter factor of 19/32 = 0.59375 matches the filter factor estimate in the access plan for this predicate:

Listing 35. Filter factor
6) Sargable Predicate
	Relational Operator: 	Equal (=)
	Subquery Input Required: 	No
	Filter Factor: 		0.59375

	Predicate Text:
	--------------
	(Q2.SEX = 'M')

The next predicate resolved at this operator:

Listing 36. Next predicate
SELECT COUNT(*)
	FROM
		SKAPOOR.EMPLOYEE AS Q2
	WHERE 
		Q2.WORKDEPT IN ('E21', 'A00');

Both predicates combined:

Listing 37. Both predicates
SELECT COUNT(*)
	FROM
		SKAPOOR.EMPLOYEE AS Q2
	WHERE 
		Q2.WORKDEPT IN ('E21', 'A00') AND
		(Q2.SEX = 'M');

Next, operators (12) and (11) can be both skipped because they are not applying any predicates, just sorting and scanning from the sorted data, and the SORT is not distincting.

The following join predicate is applied:

Listing 38. MSJOIN(6)
Predicates:
----------
7) Predicate used in Join
	Relational Operator: 	Equal (=)
	Subquery Input Required: 	No
	Filter Factor: 		0.5

	Predicate Text:
	--------------
	(Q2.WORKDEPT = Q1.DEPTNO)

This is the same join predicate applied at FILTER(10). Now, accumulate both the inner and outer to generate the query at MSJOIN(6), that is, FILTER(10) and TBSCAN(9) (same as TBSCAN(7)).

Listing 39. Accumulate inner and outer joins
SELECT COUNT(*)
	FROM
		SKAPOOR.DEPARTMENT AS Q1,
		SKAPOOR.EMPLOYEE AS Q2
	WHERE 
		Q2.WORKDEPT IN ('E21', 'A00') AND
		(Q2.SEX = 'M') AND
		(Q2.WORKDEPT = Q1.DEPTNO) AND
		Q1.DEPTNO IN ('E21', 'A00');

SORT(5) AND TBSCAN(4) are again not applying any predicates. Moving on to (17).

Listing 40. TBSCAN(17)
Predicates:
----------
5) Sargable Predicate
	Relational Operator: 	Equal (=)
	Subquery Input Required: 	No
	Filter Factor: 		0.292683

	Predicate Text:
	--------------
	(Q3.SALES = 1)

And the count query would be:

Listing 41. Count query for TBSCAN(17)
SELECT COUNT(*)
	FROM
		SKAPOOR.SALES AS Q3 
	WHERE
		(Q3.SALES = 1);

Operators (16) and (15) can be ignored, as they are not applying any predicates and the SORT is not distincting.

Finally, MSJOIN(3).

Listing 42. MSJOIN(3)
Predicates:
----------
8) Predicate used in Join
	Relational Operator: 	Equal (=)
	Subquery Input Required: 	No
	Filter Factor: 		0.0322581

	Predicate Text:
	--------------
	(Q3.SALES_PERSON = Q2.LASTNAME)

Accumulate both the inner and outer arms of the join and you get:

Listing 43. Accumulate both joins
SELECT COUNT(*)
	FROM
		SKAPOOR.DEPARTMENT AS Q1,
		SKAPOOR.EMPLOYEE AS Q2,
		SKAPOOR.SALES AS Q3 
	WHERE
		(Q3.SALES = 1) AND
		(Q3.SALES_PERSON = Q2.LASTNAME) AND
		 Q2.WORKDEPT IN ('E21', 'A00') AND
		(Q2.SEX = 'M') AND
		(Q2.WORKDEPT = Q1.DEPTNO) AND
		 Q1.DEPTNO IN ('E21', 'A00');

Remember, these are counts for INNER join. If the query involves OUTER JOINS, then you must follow the query rewrite. You can tell the type of join (INNER, RIGHT, or LEFT) by the details of the join operator.

Example 3: Partitioned table

In a multi-partition environment, you can have single-partition tables or multi-partitioned tables. This example will illustrate a point about the latter.

If you created the sample database over two logical partitions, and ran the following query:

Listing 44. Query for partitioned table
select * from sids.emp_photo

The db2exfmt output for the query would be similar to this:

Listing 45. db2exfmt output
Access Plan:
-----------
        Total Cost:             12.9181
        Query Degree:           1

      Rows
     RETURN
     (   1)
      Cost
       I/O
       |
       12
     DTQ
     (   2)
     12.9181
        1
        |
        6
     TBSCAN
     (   3)
     12.8735
        1
        |
        6
 TABLE: SIDS
    EMP_PHOTO

Notice that the TABLE shows a cardinality estimate of 6. This is actually the per-partition cardinality estimate, and not the total cardinality. The reason for this is that for partitioned tables, the base-level scan's such as a tablescan (TBSCAN) or indexscan (IXSCAN) have estimates that are per-partition estimates.

Listing 46. Cardinality estimate
Formula:
Table cardinality = 
Per partition table cardinality * Number of partitions table is partitioned on

So in the above example, this means the table cardinality is 12 (6 * 2). You can verify this by examining the db2look mimic, or querying the CARD column of SYSSTAT.TABLES.

Listing 47. Verify cardinality
UPDATE SYSSTAT.TABLES
SET CARD=12,
    NPAGES=2,
    FPAGES=2,
    OVERFLOW=0,
    ACTIVE_BLOCKS=0
WHERE TABNAME = 'EMP_PHOTO' AND TABSCHEMA = 'SIDS    ';

To be more generic, the formula would be:

Listing 48. Formula for cardinality estimate
Estimated Cardinality at Operator X on all partitions =
Estimated Cardinality at Operator X in the graph * total partitions in the output stream

This way, if you need to compare actual cardinalities to operators that are being performed on multiple partitions, you can compare the count(*) output to this Estimated Cardinality at Operator X on all partitions.

For example, in the previous query db2exfmt output, if you examine the details of the above DTQ(2) operator you will notice:

Listing 49. DTQ(2) operator
Output Streams:
--------------
        3) To Operator #1

                Estimated number of rows:       12
                Partition Map ID:               -100
                Partitioning:                   (COOR )
                                                Coordinator Partition
                Number of columns:              3
                Subquery predicate ID:          Not Applicable

This tells us that the output stream or results of the DTQ are flowing to a single coordinator partition, which implies that the cardinality at DTQ(2) does not need to be multiplied by 2. The cardinality for DTQ(2) as shown in the plan is 12 (that is, 12 rows * 1 partition). The cardinality for TBSCAN (3) is also 12 (that is, 6 rows * 2 partitions).

The purpose of this example is to show that when dealing with partitioned tables, you need to account for the number of partitions from the operator output stream.

Example 4: Reverse engineering method

Let's look at an example where you can use the Optimized Statement, which is the version of the original SQL statement rewritten by db2, to generate a count(*) query useful for determining the actual cardinality at various operators in the plan.

Here you have a three-table join involving a UNION, and from the db2exfmt output pertaining to the query, you have:

(a)Original SQL:

Listing 50. Original SQL
select S0.COL3
from sids.TABLE1 B, sids.TABLE2 S0
where B.myid in
   (select myid
   from sids.TABLE1
   where (Col1,yourid,mynum) in
      (select Col1, 1420, max(mynum) as mynum
      from sids.TABLE1
      where Col2 = '0245338201' and mynum >= 2 and mynum
              <= 18051 and yourid=1420
      group by Col1
      union
      select Col1, 1592, max(mynum) as mynum
      from sids.TABLE1
      where Col2 = '0245338201' and mynum >= 5 and mynum
              <= 21122 and yourid=1592
      group by Col1 ) ) and boolflag='N' and S0.myid =
        B.myid

(b) Optimized SQL:

Listing 51. Optimized SQL
SELECT DISTINCT Q12.COL3 AS "COL3"
FROM SIDS.TABLE1 AS Q1,
   (SELECT Q10.$C0, Q10.$C1, Q10.$C2
   FROM
      (SELECT Q4.$C0, 1592, Q4.$C1
      FROM
         (SELECT Q3.$C0, MAX(Q3.$C1)
         FROM
            (SELECT Q2.COL1, Q2.MYNUM
            FROM SIDS.TABLE1 AS Q2
            WHERE (Q2.YOURID = 1592) AND (Q2.MYNUM <= 21122) AND
                    (5 <= Q2.MYNUM) AND (Q2.COL2 =
                    '0245338201 ')) AS Q3
         GROUP BY Q3.$C0) AS Q4
      UNION ALL
      SELECT Q8.$C0, 1420, Q8.$C1
      FROM
         (SELECT Q7.$C0, MAX(Q7.$C1)
         FROM
            (SELECT Q6.COL1, Q6.MYNUM
            FROM SIDS.TABLE1 AS Q6
            WHERE (Q6.YOURID = 1420) AND (Q6.MYNUM <= 18051) AND
                    (2 <= Q6.MYNUM) AND (Q6.COL2 =
                    '0245338201 ')) AS Q7
         GROUP BY Q7.$C0) AS Q8 ) AS Q10) AS Q11, SIDS.TABLE2 AS
        Q12
WHERE (Q1.MYNUM = Q11.$C2) AND (Q1.YOURID = Q11.$C1) AND
        (Q1.COL1 = Q11.$C0) AND (Q12.MYID = Q1.MYID) AND
        (Q1.BOOLFLAG = 'N')

Corresponding graph:

Listing 52. Corresponding graph
Access Plan:
-----------
        Total Cost:             763.455
        Query Degree:           1

                                      Rows
                                     RETURN
                                     (   1)
                                      Cost
                                       I/O
                                        |
                                   2.89639e-05
                                     TBSCAN
                                     (   2)
                                     763.455
                                     30.5262
                                        |
                                   2.89639e-05
                                     SORT
                                     (   3)
                                     763.455
                                     30.5262
                                        |
                                   2.89639e-05
                                     NLJOIN
                                     (   4)
                                     763.453
                                     30.5262
                             /----------+----------\
                      3.02247e-05                 0.958284
                        NLJOIN                     IXSCAN
                        (   5)                     (  16)
                        738.421                    75.0356
                        29.5262                       3
                  /--------+--------\                 |
             7.84206              3.85418e-06    5.00908e+06
             UNION                  IXSCAN     INDEX: SIDS
             (   6)                 (  15)        A4_COVER
             150.123                75.0371
                6                      3
          /-----+-----\                |
     1.17815          6.66392     5.22714e+06
     GRPBY            GRPBY     INDEX: SIDS
     (   7)           (  11)   RELID_COLLIDSTAG
     75.0593          75.0622
        3                3
        |                |
     1.17815          6.66392
     TBSCAN           TBSCAN
     (   8)           (  12)
     75.0591          75.0614
        3                3
        |                |
     1.17815          6.66392
     SORT             SORT
     (   9)           (  13)
     75.0583          75.0601
        3                3
        |                |
     1.17815          6.66392
     IXSCAN           IXSCAN
     (  10)           (  14)
     75.057           75.057
        3                3
        |                |
   5.22714e+06      5.22714e+06
 INDEX: SIDS  INDEX: SIDS
     BASE_IX          BASE_IX

First, determine for which operators you would need to gather counts, and use the guidelines (A - D) mentioned earlier.

  1. Based on guideline (A), IXSCAN(10) and IXSCAN(14) are both base accesses. Although IXSCAN(15) is a base access, it is dependent on NLJOIN (5), and so it can be ignored in the interim.
  2. Based on guideline (B), the cardinality does not change from IXSCAN(10) to SORT(9), where both estimate 1.17815 rows. So in the first set of counts, SORT(9) is a good candidate for exception. You can always gather it later if you suspect this SORT to be a problem.
  3. Guideline (C) does not apply to this example.
  4. Incorporating guideline (D), gather the following counts in this order:
Listing 52. Gather counts
IXSCAN(10)
IXSCAN(14)
UNION(6)
NLJOIN(5)
NLJOIN(4)

Note: The following has been reduced to show relevant portions of the plan.

IXSCAN(10)

When looking at the properties of IXSCAN(10), you can see that the predicates applied are referencing columns qualified by labels such as Q2.

Listing 53. IXSCAN
8) Sargable Predicate
                        Relational Operator:            Equal (=)
                        Subquery Input Required:        No
                        Filter Factor:                  0.0533979

                        Predicate Text:
                        --------------
                        (Q2.YOURID = 1592)

                        Relational Operator:            Less Than or Equal (<=)
                        Subquery Input Required:        No
                        Filter Factor:                  0.970071

                        Predicate Text:
                        --------------
                        (5 <= Q2.MYNUM)

These labels are quantifiers which qualify the column name. They can be base tables or intermediate result sets.

Since this method uses the "Reverse Method" to generate counts, we can see that the Q2.YOURID corresponds to the innermost parenthesized SQL of the Optimized Statement (see the italicized portion of example 4b above).

When you extract this statement and then modify the select list so that it references a simple count(*) instead, the count(*) query pertaining to IXSCAN(10) is:

Listing 54. Count query for IXSCAN(10)
SELECT count(*) 
FROM SIDS.TABLE1 AS Q2 
WHERE (Q2.YOURID = 1592) AND (Q2.MYNUM <= 21122) 
AND 
(5 <= Q2.MYNUM) AND (Q2.COL2 = '0245338201 ')

IXSCAN(14) is similar in nature. View the properties at IXSCAN(14), and notice its predicates are:

Listing 55. IXSCAN(14)
  13) Sargable Predicate
          Relational Operator:            Equal (=)
          Subquery Input Required:        No
          Filter Factor:                  0.301996

          Predicate Text:
          --------------
          (Q6.YOURID = 1420)

This also can be easily traced back to the optimized statement (see the bold portion of example 4b above) -->, and this gives you the following count(*) query pertaining to IXSCAN(14):

Listing 55. Count for IXSCAN(14)
SELECT count(*)
FROM SIDS.TABLE1 AS Q6 
WHERE (Q6.YOURID = 1420) AND (Q6.MYNUM <= 18051) AND 
(2 <= Q6.MYNUM) AND (Q6.COL2 = '0245338201 ')

In order to get the count(*) query at UNION(6), you need to extract from the Optimized Statement the innermost parenthesized SQL statement that at minimum gives you the UNION ALL. How can you do this?

You can see that the UNION ALL is a UNION of two intermediate tables: Q4 and Q8 from the Optimized Statement in 4b. The Q4 disjunct on its own is a complete SQL starting from "SELECT Q4.$C0" and ending at "AS Q4". Likewise, the Q8 disjunct is a complete SQL starting from "SELECT Q8.$C0" and ending at "AS Q8". The simplest way to extract this statement is to start from the innermost parentheses and build outwards.

This produces the following SQL statement which is structurally complete but with invalid syntax.

Listing 56. Starting from innermost parentheses
SELECT Q4.$C0, 1592, Q4.$C1
      FROM
         (SELECT Q3.$C0, MAX(Q3.$C1)
         FROM
            (SELECT Q2.COL1, Q2.MYNUM
            FROM SIDS.TABLE1 AS Q2
            WHERE (Q2.YOURID = 1592) AND (Q2.MYNUM <= 21122) AND
                    (5 <= Q2.MYNUM) AND (Q2.COL2 =
                    '0245338201 ')) AS Q3
         GROUP BY Q3.$C0) AS Q4
      UNION ALL
      SELECT Q8.$C0, 1420, Q8.$C1
      FROM
         (SELECT Q7.$C0, MAX(Q7.$C1)
         FROM
            (SELECT Q6.COL1, Q6.MYNUM
            FROM SIDS.TABLE1 AS Q6
            WHERE (Q6.YOURID = 1420) AND (Q6.MYNUM <= 18051) AND
                    (2 <= Q6.MYNUM) AND (Q6.COL2 =
                    '0245338201 ')) AS Q7
         GROUP BY Q7.$C0) AS Q8

So the next step is to correct the syntax here. Two common rules apply:

  • The first rule is replacing every occurrence of "Qn.$Ci" in the SELECT list with "Qn.Ci AS Ci" where i is a number enumerating from 0 onwards for each comma-separated item in the select list, and n is just the number pertaining to the intermediate resulting table.

    For example:

    Listing 57. Corrected syntax(6)
    "SELECT Q3.$C0, MAX(Q3.$C1)"
    changes to:
    "SELECT Q3.C0 AS C0, MAX(Q3.C1) AS C1"

    Essentially, this removes the "$" in the select list and enumerates each SELECT list item starting from C0, C1, .. C<i>, until you reach the ith item in the SELECT list.

  • The second rule is removing any occurrences of "$" anywhere in the query. For example:

    Listing 58. Quotes in query
    "GROUP BY Q7.$C0"
    changes to:
    "GROUP BY Q7.C0"

    Applying this rule gives us the following query:

    Listing 59. Resulting query
    SELECT Q4.C0, 1592, Q4.C1 
    FROM 
    (SELECT Q3.C0 AS C0, MAX(Q3.C1) AS C1 
    FROM 
    (SELECT Q2.COL1 AS C0, Q2.MYNUM AS C1 
    FROM SIDS.TABLE1 AS Q2 
    WHERE (Q2.YOURID = 1592) AND (Q2.MYNUM <= 21122) AND 
    (5 <= Q2.MYNUM) AND (Q2.COL2 = 
    '0245338201 ')) AS Q3 
    GROUP BY Q3.C0) AS Q4 
    UNION ALL 
    SELECT Q8.C0, 1420, Q8.C1 
    FROM 
    (SELECT Q7.C0 AS C0, MAX(Q7.C1) AS C1 
    FROM 
    (SELECT Q6.COL1 AS C0, Q6.MYNUM AS C1 
    FROM SIDS.TABLE1 AS Q6 
    WHERE (Q6.YOURID = 1420) AND (Q6.MYNUM <= 18051) AND 
    (2 <= Q6.MYNUM) AND (Q6.COL2 = 
    '0245338201 ')) AS Q7 
    GROUP BY Q7.C0) AS Q8

    Note: You did not change the above SQL to a select count(*) query, since simply running the query would still return its total row count. However, if a select count(*) is needed, you only need to add it to the outermost select.

Applying the same principles as above, we get the following query for NLJOIN(5):

Listing 60. NLJOIN(5)
SELECT Q11.C2, Q11.C1, Q11.C0 
FROM SIDS.TABLE1 AS Q1, 
(SELECT Q10.C0 AS C0, Q10.C1 AS C1, Q10.C2 AS C2 
FROM 
(SELECT Q4.C0 AS C0, 1592 AS C1, Q4.C1 AS C2 
FROM 
(SELECT Q3.C0 AS C0, MAX(Q3.C1) AS C1 
FROM 
(SELECT Q2.COL1 AS C0, Q2.MYNUM AS C1 
FROM SIDS.TABLE1 AS Q2 
WHERE (Q2.YOURID = 1592) AND (Q2.MYNUM <= 21122) AND 
(5 <= Q2.MYNUM) AND (Q2.COL2 = 
'0245338201 ')) AS Q3 
GROUP BY Q3.C0) AS Q4 
UNION ALL 
SELECT Q8.C0 AS C0, 1420 AS C1, Q8.C1 AS C2 
FROM 
(SELECT Q7.C0 AS C0, MAX(Q7.C1) AS C1 
FROM 
(SELECT Q6.COL1 AS C0, Q6.MYNUM AS C1 
FROM SIDS.TABLE1 AS Q6 
WHERE (Q6.YOURID = 1420) AND (Q6.MYNUM <= 18051) AND 
(2 <= Q6.MYNUM) AND (Q6.COL2 = 
'0245338201 ')) AS Q7 
GROUP BY Q7.C0) AS Q8) AS Q10) AS Q11 
WHERE (Q1.MYNUM = Q11.C2) AND (Q1.YOURID = Q11.C1) AND 
(Q1.COL1 = Q11.C0) AND (Q1.BOOLFLAG = 'N');

Likewise, here's the query for NLJOIN(4):

Listing 61. NLJOIN(4)
SELECT Q1.MYID, Q11.C2, Q11.C1, Q11.C0, Q12.COL3 AS "COL3" 
FROM SIDS.TABLE1 AS Q1, 
(SELECT Q10.C0 AS C0, Q10.C1 AS C1, Q10.C2 AS C2 
FROM 
(SELECT Q4.C0 AS C0, 1592 AS C1, Q4.C1 AS C2 
FROM 
(SELECT Q3.C0 AS C0, MAX(Q3.C1) AS C1 
FROM 
(SELECT Q2.COL1 AS C0, Q2.MYNUM AS C1 
FROM SIDS.TABLE1 AS Q2 
WHERE (Q2.YOURID = 1592) AND (Q2.MYNUM <= 21122) AND 
(5 <= Q2.MYNUM) AND (Q2.COL2 = 
'0245338201 ')) AS Q3 
GROUP BY Q3.C0) AS Q4 
UNION ALL 
SELECT Q8.C0 AS C0, 1420 AS C1, Q8.C1 AS C2 
FROM 
(SELECT Q7.C0 AS C0, MAX(Q7.C1) AS C1 
FROM 
(SELECT Q6.COL1 AS C0, Q6.MYNUM AS C1 
FROM SIDS.TABLE1 AS Q6 
WHERE (Q6.YOURID = 1420) AND (Q6.MYNUM <= 18051) AND 
(2 <= Q6.MYNUM) AND (Q6.COL2 = 
'0245338201 ')) AS Q7 
GROUP BY Q7.C0) AS Q8) AS Q10) AS Q11, SIDS.TABLE2 AS Q12 
WHERE (Q1.MYNUM = Q11.C2) AND (Q1.YOURID = Q11.C1) AND 
(Q1.COL1 = Q11.C0) AND (Q12.MYID = Q1.MYID) AND (Q1.BOOLFLAG = 'N');

You can generate NLJOIN(5) and NLJOIN(4) yourself as an exercise to compare with these last two cases above.


Conclusion

Comparing real cardinalities to the optimizer's estimates can help you identify query planning problems. Due to the complexity of real-world data relationships, it is not always possible for estimates to be perfect, but there are a number of tools and techniques available to improve them. This article helps you identify which parts of the plan to focus on when applying those techniques.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=100696
ArticleTitle=Comparing real-time cardinality to the optimizer cardinality estimates
publish-date=12152005