Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Meet the Experts: Sheryl Larsen on the Power of Table Expressions

Sheryl Larsen (SherylMLarsen@cs.com)Sheryl M. Larsen, Inc.
Photo: Sheryl Larsen

Sheryl Larsen is an internationally recognized researcher, consultant and lecturer, specializing in DB2 and is known for her extensive expertise in SQL. Sheryl has over 16 years experience in DB2, has published many articles, several popular DB2 posters, and co-authored a book, DB2 Answers, published by Osborne-McGraw-Hill. She was voted Best Overall Speaker at the 1999 & 2001 IDUG conferences and the Executive Editor of the IDUG Solutions Journal magazine 1997-2000. Currently, she is President of the Midwest Database Users Group (mwdug.org), a member of IBM's DB2 Gold Consultants program, and is President of Sheryl M. Larsen, Inc.(www.smlsql.com), a firm specializing in Advanced SQL Consulting and Education. She can be contacted at SherylMLarsen@cs.com.

Summary:  SQL expert Sheryl Larsen uses examples to reveals the practical power of table expressions for flexibility and performance.

Date:  01 Apr 2002
Level:  Introductory

Activity:  3023 views
Comments:  

Introduction

I have a profound statement to make about SQL skills - on average most DB2® developers are seven years behind. The reason I can make this claim is that I have traveled all over the globe in the past 12 months (almost 100,000 miles) and have spoken to over 5,000 DB2 developers on the subject of advanced SQL. From all those contacts, there hasn't been one person that did not learn one new SQL feature, or a use of and old SQL feature, from my presentation or seminar. The one feature that I find especially underutilized is table expressions. These are called in-line views on the non-OS/390 platforms and both features have been around for seven years. These are nothing more than SELECT statements that are nested in the FROM clause and are surrounded by parentheses. If they are used correctly, they can be very powerful. I don't blame developers for the lack of knowledge on this feature. I believe it is because they have been sent to a JavaTM class or two instead of DB2 Version x Transition classes.

Although table expressions seem like a simple feature syntactically, they can perform many practical and powerful processes including:

Some of these uses of table expressions you may be familiar with, others you may not. I will discuss examples of each table expression use to show off the power.

I am on a quest to update the SQL skills worldwide and I am confident that you will learn at least one new use of table expressions that will make an impact at your site, so please read on.


Pre-filtering full outer joins

Full outer joins have been enhanced for performance in recent years with work file removal and parallelism; however, the syntax did not receive the same magic that the left and right joins received, such as automatic predicate pushdown and predicate transitive closure. The full outer join shown in Figure 1 does not produce the expected results, e.g., give me all the orders within the specified date range whether or not they have descriptions less than 40 characters, and give me all the descriptions less than 40 characters whether or not there are orders for them:


Figure 1. This statement does not produce the expected results

SELECT 	COALESCE(O.ORDER_NO, 'Order Number not Available')
		, COALESCE(D.DESC, 'Description not Available')
FROM 	ORDER O
     FULL JOIN
	DESCRIPTIONS  D
ON	O.ORDER_NO = D.ORDER_NO
WHERE 	O.ORDER_DATE BETWEEN '1999-01-01' AND CURRENT DATE
   AND	LENGTH(D.DESC < 40)

Instead, the results will look like an inner join because the syntax used in Figure 1 forces the local filtering to being applied in stage 2 -- in other words, after the join. This means that many rows will be participating in the join only to be eliminated in a later stage. To force the local filtering to be applied, before the join (stage 1), you can use table expressions, as shown in Figure 2.


Figure 2. Forcing filtering before the join to obtain the correct results

SELECT  COALESCE(O.ORDER_NO, 'Order Number not Available')
	, COALESCE(D.DESC, 'Description not Available')
FROM 	(SELECT 	O.ORDER
	  FROM		ORDER O
	  WHERE 	O.ORDER_DATE BETWEEN '1999-01-01' AND CURRENT DATE) AS O
     FULL JOIN
	(SELECT	D.DESC, D.ORDER_NO
	 FROM		DESCRIPTIONS  D
	 WHERE	LENGTH(D.DESC < 40) AS D
ON	O.ORDER_NO = D.ORDER_NO

The 'O' table expression explicitly pre-filters the outer table, and the 'D' table expression explicitly pre-filters the inner table. This syntax gives me all the orders within the specified date range whether or not they have descriptions less than 40 characters, and give me all the descriptions less than 40 characters whether or not there are orders for them. The before join conditions ensure that only final result rows will be processed in the join.


Pre-filtering null-supplying tables of left and right joins

For left and right joins, if the null-supplying table is filtered in the WHERE clause, then it's really an inner join, which probably isn't what you wanted when you coded the join. If you really want a left or right join, then you have two choices:

  • Place the local filtering of the null-supplying table in the ON clause.

This will move the filtering to a during-join condition; however, it will not eliminate final result rows. During-join conditions on null-supplying tables only serve to "null out" (the row remains but is filled with nulls), the final result rows, including values from the protected table, that are predetermined by the before join conditions. See Terry Purcell's previous Meet the Expert column for excellent examples of during-join conditions.

  • Place the local filtering of the null-supplying table in a table expression.

This will only "null out" rows from null-supplying table. The query in Figure 3 demonstrates the use of table expressions, and most likely, what developers need:


Figure 3. Using a table expression to force local filtering before the join
SELECT  O.ORDER_NO
	, COALESCE(D.DESC, 'Short Description not Available')
FROM 	ORDER O
       LEFT JOIN
	(SELECT 	D.DESC, D.ORDER_NO
	 FROM		DESCRIPTIONS  D
	 WHERE	LENGTH(D.DESC) < 40) AS D
       ON  O.ORDER_NO = D.ORDER_NO
WHERE O.ORDER_DATE BETWEEN '1999-01-01' AND CURRENT DATE

The query in Figure 3 returns all the orders within the specified date range whether or not they have descriptions less than 40 characters.


Separating GROUP BY work

If you develop on DB2 on platforms other than 0S/390 and z/0S, you can skip this section because your optimizer will either automatically rewrite the GROUP BY statement to separate the GROUP BY work and eliminate the sort, or you may have constructed automatic summary tables, ASTs, that can eliminate the sort. For all mainframers, you need to think about rewriting your own queries.

A typical report quite often has to supply detail information, as well as summary information. GROUP BY is a very picky clause in DB2:

  • The first rule of GROUP BY is that you must SELECT what you are grouping by.
  • The second rule is that you must aggregate everything else with a built-in column function.

This combination of rules usually forces you to join tables so that the detail and summary information can be merged in one result row as in the following query:

SELECT C.CUST_ID, MIN(C.CUST_NAME) AS CUST_NAME,
MIN(C.CUST_PHONE)AS CUST_PHONE, SUM(S.SALES)AS TOTAL_SALES
FROM	CUSTOMER C, SALES  S
WHERE	C.CUST_ID = S.CUST_ID
  AND	S.SALES_DATE BETWEEN :date-lo AND :date-hi
GROUP BY 	C.CUST_ID

This usually means the detail rows that qualify from each table are joined prior to the GROUP BY processing. Let say in this example there were 1000 CUSTOMER rows joined to 200,000 SALES rows and then the GROUP BY was processed. Instead, you can use table expressions to force the optimizer to process the aggregations on a table-by-table basis. The query shown in Figure 4, explicitly influences the optimizer toward earlier GROUP BY processing:


Figure 4. Moving GROUP BY inside a table expression to reduce the number of rows joined

SELECT C.CUST_NAME, C.CUST_PHONE, S.TOTAL_SALES
FROM	CUSTOMER C,
 , 	(SELECT 	S.CUST_ID, SUM(S.SALES) AS TOTAL_SALES
	 FROM		SALES  S
	 WHERE	S.SALES_DATE BETWEEN :date-lo AND :date-hi
	 GROUP BY 	S.CUST_ID)  AS  S
WHERE	C.CUST_ID = S.CUST_ID

The table expression 'S' in this example is explicitly separating the GROUP BY work. Quite often the aggregation is only needed on one or two tables and the detail comes from different tables. This syntax allows the optimizer to search for index support on the CUST_ID column of the SALES table to perform the GROUP BY. If available, the sort will be avoided. This greatly reduces the number of rows involved in the join regardless if a sort is performed. Instead of a 1000 CUSTOMER rows joined to 200,000 SALES rows in the previous example, it is now 1000 CUSTOMER rows joined to only 10,000 sales summary rows. The GROUP BY processing compresses the table expression result to summary rows.


Generating data from existing sources

By far the most powerful use of table expressions can be explained using a quote from a client, "When we need data not available from DB2, we use table expressions to manufacture it." The query shown in Figure 5 contains a table expression that calculates sales for groupings that are generated in the table expression.


Figure 5. Generating subcategory data in the table expression

SELECT  	C.SUBCATEGORY, SUM(S.SALES) AS S.TOTAL_SALES
FROM	   	SALES S ,
   (SELECT C.CUST_ID, SUBSTR(C.COLX, :hvstart, :hvlngth) AS SUBCATEGORY
    FROM   CUSTOMER  C
    WHERE  C.CUST_REGION BETWEEN :reg-lo AND :reg-high)  AS  C
WHERE   	C.CUST_ID = S.CUST_ID
GROUP BY	C.SUBCATEGORY

This query is effective for periodic reporting even if a sort is required. It can also be good for finding out if there is hidden meaning in column values. However, if end users are frequently requesting a common starting position and length, i.e. SUBSTR(C.COLX, 2, 4), that portion of the column should be added to the table as its own column. This would allow an index to be provided on that new column to avoid the sort.

Figure 6 is an example of using a table expression to generate the correct quarters for a quarterly report.


Figure 6. Generating subcategory data for a quarterly report

SELECT  C.SUBCATEGORY, SUM(S.SALES) AS S.TOTAL_SALES
FROM	   SALES S ,
(SELECT C.CUST_ID,
(CASE 	WHEN MONTH(SALE_DT) BETWEEN 2 AND 4 THEN 'Q1'
		WHEN MONTH(SALE_DT) BETWEEN 5 AND 7 THEN 'Q2'
		WHEN MONTH(SALE_DT) BETWEEN 8 AND 10 THEN 'Q3'
 		ELSE 'Q4' END) AS SUBCATEGORY
FROM	   CUSTOMER  C
WHERE  C.CUST_REGION BETWEEN :reg-lo AND :reg-high)	 AS  C
WHERE  C.CUST_ID = S.CUST_ID
GROUP BY    C.SUBCATEGORY

You are allowed to refer to the generated data anywhere outside the table expressions as long as the data you are generating is named using the AS identifier inside the table expression. This means that you can also order by that identifier, include it in calculations, and put it anywhere an expression can be placed. The example in Figure 6 can be extended to handle more options by using host variables, as shown in Figure 7.


Figure 7. Using host variables for extended flexibility

SELECT  C.SUBCATEGORY, SUM(S.SALES) AS S.TOTAL_SALES
FROM	   SALES S ,
(SELECT C.CUST_ID,
(CASE 	WHEN MONTH(SALE_DT) BETWEEN :hv1a AND :hv1b THEN :hv1
		WHEN MONTH(SALE_DT) BETWEEN :hv2a AND :hv2b THEN :hv2
		WHEN MONTH(SALE_DT) BETWEEN :hv3a AND :hv3b THEN :hv3
 		WHEN MONTH(SALE_DT) BETWEEN :hv4a AND :hv4b THEN :hv4
		WHEN MONTH(SALE_DT) BETWEEN :hv5a AND :hv5b THEN :hv5
 		WHEN MONTH(SALE_DT) BETWEEN :hv6a AND :hv6b THEN :hv6
		WHEN MONTH(SALE_DT) BETWEEN :hv7a AND :hv7b THEN :hv7
 		WHEN MONTH(SALE_DT) BETWEEN :hv8a AND :hv8b THEN :hv8
		WHEN MONTH(SALE_DT) BETWEEN :hv9a AND :hv9b THEN :hv9
 		WHEN MONTH(SALE_DT) BETWEEN :hv10a AND :hv10b THEN :hv10
		WHEN MONTH(SALE_DT) BETWEEN :hv11a AND :hv11b THEN :hv11
 		ELSE :hv12 END) AS SUBCATEGORY
FROM	   CUSTOMER  C
WHERE  C.CUST_REGION BETWEEN :reg-lo AND :reg-high)	 AS  C
WHERE  C.CUST_ID = S.CUST_ID
GROUP BY    C.SUBCATEGORY

Now the query can handle endless requests for monthly groupings - What do you want to GROUP BY today? How about the first 4 months in one period, the next two in another, months 6 + 12 together and the remaining as the fourth grouping? Keep this in mind when building applications. It would take hundreds of views to emulate the same options. This is why at most of my clients, views have been moved out of production applications and allowed only in end user environments.

For even greater power and flexibility, add a user-defined function, as shown in Figure 8.


Figure 8. Using a UDF to enable access to nonrelational data

SELECT  C.SUBCATEGORY, SUM(S.SALES) AS S.TOTAL_SALES
FROM	   SALES S ,
(SELECT C.CUST_ID, UDFUNC2(:hvparm1, :hvparm2) AS SUBCATEGORY
FROM	   CUSTOMER  C
WHERE   C.CUST_REGION BETWEEN :reg-lo AND :reg-high)	 AS  C
WHERE   C.CUST_ID = S.CUST_ID
GROUP BY    C.SUBCATEGORY

Now the query can access data from any source available from a mainframe platform. For example, UDFUNC2 can access data in a flat file stored in distributed environment, pull it in to table expression 'C', call it SUBCATEGORY, and use GROUP BY on it. What do you really want to GROUP BY today? Think of using table expressions any time you need to pull in or create data that is not living and breathing natively in your DB2 environment but which needs to be included in a relational operation (join, union, subquery, ORDER BY, GROUP BY, etc.) and only for one execution of the query. For multiple executions or iterations, I would recommend global temporary tables (a topic for another day).


Using table expressions to help tune queries

Query tuning is a very large subject so I will narrow it down by just discussing the role that table expressions can play in tuning. First of all, moving views out of SYSIBM.SYSVIEWS and placing them in-line in the FROM clause helps query tuners explicitly see more of what the query is trying to accomplish. This is important if you are the one who is beeped at 0200 hours. The indirect benefit of the move is allowing for more extensibility with the use of host variables.

To demonstrate proactive tuning with table expressions, I will show you a query and walk you through the performance problems before I show the solution. This first query is the following 5-table join:

SELECT Columns …..
FROM TABX, TABY, TABZ, TAB1, TAB2
WHERE join conditions

The problem with the previous query is the table join sequence. The optimizer analyzes the numbers of rows participating in the join from each table, costs out different sequencing and join methods, and keeps starting with the TABX, TABY, and TABZ combination. The desired join sequence starts with TAB1 and TAB2. There are many ways to influence join sequencing for static queries; however, dynamic queries are not so lucky. Table expressions to the rescue! The table expression works by forcing the optimizer to pre-join tables through the use of the DISTINCT keyword, which also requests a sort, as follows:

SELECT Columns…
FROM TABX, TABY, TABZ
	,(SELECT DISTINCT COL1, COL2 …..
	  FROM TAB1, TAB2
      WHERE join conditions) AS PREJOIN
      WHERE remaining join conditions

This query now:

  1. Joins TAB1 and TAB2.
  2. Sorts to remove possible duplicates.
  3. Processes the other tables.

The only drawback to this technique is the added cost of the sort and creation and scanning of a logical work file. Most of the time, this drawback is compensated for with the optimal join sequence.

This next scenario is more likely on a non-OS/390 platform, but the solution can work for all platforms. In the example below, the optimizer chooses to materialize the entire table expression T2 and then joins to T1 using a merge scan join technique:

SELECT 	Columns from either set of data
FROM  	table1 T1,
        	(SELECT  T2.STUDENT_YEAR, SUM(C8) AS SUM8, MAX(C9) AS MAX9
		FROM 	table2 T2
		GROUP BY T2.STUDENT_YEAR) AS T2
        WHERE T1.MAJOR_ID = T2.MAJOR_ID

The solution requires that you insert the keyword TABLE just before the table expression. This enables the table expression to be correlated to the outer table. The correlation influences the optimizer toward nested loop joins and can greatly reduce the materialization. In the following example, a table expression was made correlated by moving the join condition inside the table expression:

SELECT 	Columns from either set of data
FROM  	table1 T1,
        	TABLE(SELECT  T2.STUDENT_YEAR, SUM(C8) AS SUM8, MAX(C9) AS MAX9
            FROM    table2 T2
            WHERE T1.MAJOR_ID = T2.MAJOR_ID
			GROUP BY T2.STUDENT_YEAR) AS T2

The query now processes one T1.MAJOR_ID at a time. Only the rows that qualify for that MAJOR_ID are materialized in a logical work file. The smaller the optimizer thinks the work file will be, the more likely nested loop join will be chosen. Be alert if you try this on DB2 for OS/390 or z/OSTM, because this solution does not always improve performance. Check EXPLAIN output, use estimation tools, or run benchmarks to verify the success of this tuning effort.


Conclusion

If your goal is reduced program code and increased processing throughput, the powerful but practical table expressions can really help. It takes much less program code when a table expression can be used to generate new data and stay in the DB2 engine to perform joining, grouping, ordering and calculations, instead of declaring multiple cursors. Increasing a query's extensibility is easy when host variables are added to table expressions. Tuning queries, separating the GROUP BY work, and pre-filtering tables for outer joins, all help to increase processing throughput. Take your pick. The technology has been there for a while but it is up to you to take advantage of it. I hope you will assist me on my quest to update SQL skills worldwide by spreading your new discoveries throughout your site.


About the author

Photo: Sheryl Larsen

Sheryl Larsen is an internationally recognized researcher, consultant and lecturer, specializing in DB2 and is known for her extensive expertise in SQL. Sheryl has over 16 years experience in DB2, has published many articles, several popular DB2 posters, and co-authored a book, DB2 Answers, published by Osborne-McGraw-Hill. She was voted Best Overall Speaker at the 1999 & 2001 IDUG conferences and the Executive Editor of the IDUG Solutions Journal magazine 1997-2000. Currently, she is President of the Midwest Database Users Group (mwdug.org), a member of IBM's DB2 Gold Consultants program, and is President of Sheryl M. Larsen, Inc.(www.smlsql.com), a firm specializing in Advanced SQL Consulting and Education. She can be contacted at SherylMLarsen@cs.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=13121
ArticleTitle=Meet the Experts: Sheryl Larsen on the Power of Table Expressions
publish-date=04012002
author1-email=SherylMLarsen@cs.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers