Skip to main content

Tune SQL procedures

Tips and tricks for fast SQL procedures

Serge Rielau (srielau@ca.ibm.com), Senior Software Developer, IBM, Software Group
Serge Rielau is a Senior Software Developer, working on DB2 UDB SQL compiler development at the IBM Toronto lab. He has worked in DB2’s SQL Compiler development for 7 years and is an expert in the SQL Language. Serge spends a good deal of his time helping customers migrate from other DBMS to DB2 and he is an active participant in the comp.databases.ibm-db2 newsgroup.

Summary:  Do you want to learn some tricks for tuning SQL PL? If so, this article describes common examples for tuning SQL procedures in IBM® DB2® Universal Database™ (UDB) V8.2 for Linux, UNIX®, and Windows®, with a special focus on procedures ported from other database management systems.

Date:  27 Jan 2005
Level:  Introductory
Activity:  1020 views

Motivation

The SQL procedural language (SQL PL) is a popular tool in many DB2 developers' toolboxes. This is true even more so when migrating or porting an application from Oracle, Sybase, or Microsoft® SQL Server to DB2. The ready availability of SQL PL, however, brings with it some problems:

  • First, it is so much easier in many developers’ minds to develop procedural logic than it is to write SQL. Therefore, DB2 as a proven relational engine with a cutting-edge optimizer is insufficiently exploited.
  • Secondly, automated tools such as the Migration Tool Kit (MTK) produce SQL PL emulations of logic originated in T-SQL or PL/SQL. It is no secret that mere emulations will always underperform the original.

In previous articles, I have described how to find performance using the SQL PL Profiler and logic problems using SQL Procedure Tracing. In this article, I use the SQL PL Profiler to illustrate a set of common examples of slow SQL PL and how either DB2 V8.2 optimizes them on its own, or how you can tune the SQL PL yourself.


The trivial SQL Procedure

If you have a Microsoft SQL Server or Sybase background, you know these procedures very well. The paradigm is to not have any SQL in the application other than a CALL statement. The perceived benefit of this paradigm is twofold:

  • First, there is encapsulation. For example, if a query is complex, it should not be repeated multiple times. Instead it should be stored in one place.
  • Second, there is caching. The code in a stored procedure is pre-compiled. The code in the application often is dynamic code.

To address the first issue, DB2 supports the concept of inline SQL PL. DB2 allows simple logic or queries to be encapsulated into SQL functions. When the SQL function is executed from the caller, its body is macro expanded into the caller.

To address the second issue, DB2 uses its package cache. This cache remembers not only recently executed procedures. The cache also remembers previously executed statements, so that after the first compilation of the SQL statement. Subsequent invocations will simply keep executing the same execution plan. Let’s use an example:

1 CREATE PROCEDURE get_dept_emps(dept_id dept_id_t)
2 BEGIN 
3   DECLARE cur CURSOR WITH RETURN FOR 
4     SELECT emp_name, emp_id FROM emp 
5      WHERE emp_dept_id = dept_id;
6   OPEN cur;
7 END

Note that you can download all examples with the full DDL from the download section. The SQL PL Profiler used throughout this article can be downloaded from developerWorks. The screenshot below shows 100 executions of the trivial SQL Procedure above.

Trivial procedure untuned

So where does the time go? DB2 needs to process the CALL statement. DB2 then needs to initialize the SQL procedure to get ready for execution assuming the procedure has been previously cached. Then DB2 opens the cursor returns from the procedure and processes the locator allocation to the result set. Finally, DB2 can actually fetch the rows back. That was a lot of work, just to open a cursor!

Now let's use inline SQL PL to achieve the same effect:

1 CREATE FUNCTION get_dept_emps_f(dept_id dept_id_t)
2 RETURNS TABLE(emp_name name_t,
3                 emp_id   emp_id_t)
4 RETURN SELECT emp_name, emp_id
5          FROM emp
6          WHERE emp_dept_id = emp_dept_id;

We now replace the CALL and all the locator code in the driver procedure with the DECLARE cursor and its respective OPEN statement and test the result:

Trivial procedure tuned

Impressive! The monitored code ran three times faster just by using inline SQL PL. Even simple SQL Procedures, which do not return result sets, can be replaced with SQL Functions. For procedures that return more than one argument, you can use a SQL table function, which returns a table of one row - one column per output argument. For procedures with a single output, you can simply use a scalar SQL function. Note that DB2 V8.2 also supports UPDATE, DELETE, INSERT, and MERGE in SQL Table Functions. This means you can even encapsulate database modifications using inline SQL PL.


Using Temporary Tables

DB2 supports Declared Global Temporary Table (DGTT). In SQL lingo, DECLARE means that the object being defined will not have an entry in the catalog. Therefore, it is by definition private. By contrast, Created Global Temporary Tables as supported by DB2 UDB for zOS® are defined in the catalogs. While their content is, of course, private, their definition is not.

The flexibility of having a declared object (no DBA cares what the app does as long as user temporary table space is readily available) comes with a downside: For a statement to be compiled, the object must exist. If the connection is lost or the table is dropped, any statement referring to the DGTT needs to be recompiled every time the table is re-declared.

Again let’s work through an example:

Temporary table untuned

The code above is, of course, nonsense, but its structure is common, so it shall serve to illustrate the point. In this case, the procedure is called frequently and the DGTT is used only locally. So, for reasons of encapsulation, the procedure declares it’s on temporary table and then drops it when the temp is no longer needed.

What is wrong with this picture? The insert statement between the DECLARE and the DROP needs to be recompiled every time because DB2 cannot know whether the DGTT will have the same property next time around. In reality there will be any number of statements between the DECLARE and the DROP massaging the result set of this temp until it finally has served its purpose.

To avoid this compilation frenzy, it is much more appropriate to move the declaration of the temp into a separate procedure, which is executed once when the workload is started.

Temporary table tuned

As you can see above, the result is astounding. There is more to see here, though. Observe the DELETE statement that has replaced the DROP statement. It will always fail with a user-provoked error, which is then thrown away by a continue handler. What’s going on here?

Temporary tables are optimized for speed, so DB2 does not bother finding free space within a temp when inserting rows. Instead the table behaves as APPEND ONLY. While a regular DELETE would delete the rows, it would not actually cause DB2 to recycle the space. The application will keep on consuming more and more user temporary table space.

Ironically, another quirk comes to our help here. Virtually all DGTT are declared NOT LOGGED. After all, the content of the temp can be reconstructed easily enough. If a NOT LOGGED table encounters an execution error during a data changing statement there is only one choice for DB2: to truncate the table. And that is what the DELETE statement is doing.


Condition handlers

There are two common ways computer languages handle errors.

The first method requires the program to check for error conditions after every non-trivial action. Developers who have written C-UDF or C Stored Procedures in DB2 have learned to check the SQLCA after each EXEC SQL statement in their code.

The second method is to have dedicated handlers that "catch" any or specific error conditions thrown" by individual statements. Java™ and C++ programmers are familiar with this concept.

The SQL/PSM standard defines this second method of using handlers for the SQL Procedural Language. Often, however, when porting TSQL Procedures from Sybase or Microsoft SQL Server or when simply applying C-skills to SQL PL, the more modern method of handlers is avoided in favor of the more traditional method of error checking.

What happens in these cases is that a generic "catch-all" handler is defined, which saves all error information in local variables. Explicit error-handling code in the body of the procedure is then used to do the error processing. Below is a simple scenario of this type.

Condition handler untuned

In this scenario, a SELECT INTO statement is processed. Prior to the processing, the local error-variables are being reset, which is inexpensive, but still costs some CPU. The point, however, is that the VALUES INTO statement that saves away the error-variables is nearly as expensive as the SELECT INTO statement.

Ironically, in many cases the program may not care to even check the outcome of the SELECT INTO and the condition handler’s work was in vain. A couple of things should be done in this case.

First of all, the checking for a NOT FOUND warning can be done explicitly, if so desired, without having to save away the SQLCODE and SQLSTATE into local variables. This is the case because warnings are available to the very next statement in an SQL Procedure.

Secondly, all errors or unexpected warnings really should be handled by the exception handler. This avoids the copying and resetting of the local variables in addition to using the language in the way as it was intended by its inventors.

Condition handler tuned

Predictably, in the above tuned version, the entire cost of the handler has been removed and performance doubled.


Tuning existential predicates

I cannot fathom why developers write existential checks in the way I show below, but since they do, I will discuss it.

Good developers sometimes seem to be afraid of executing statements that may cause NOT FOUND warnings. Below is an example of of a statement that models a MERGE.

Existential predicate untuned

Why are these two existential checks bad?

First of all, DB2 is asked to give answers to questions that are irrelevant. Forcing DB2 to count the number of rows in a table simply to check whether there are any rows is a very expensive option.

Secondly, unless a table uses statement-level triggers, it is as expensive to check whether a row exists with an explicit query as it is to perform an update statement that does not find a row to update. As long as the row is not found, this makes no difference. But if the row was found by the EXIST predicate, then the UPDATE has to find it all over again.

So, without further comment, here is better way to handle explicit way to handle conditional updates and do existential checks in general.

Existential predicate tuned

Of course you can always use explicit EXISTS predicates and the MERGE statement with a single row, and both will have similar performance.


CALL avoidance

There are many cases when procedures are called in tight loops. An example may be a batch process working with a queue or a staging table. Each object in the queue may trigger special processing if the object fulfills specific conditions. Often, these conditions are rarely true for any given object. So the procedure processing the logic has an outermost check that will decide whether the rest of the body will execute or the procedure returns with no action taken.

CALL avoidance untuned

Just as in our first example, the problem here is that the cost of invoking the procedure without actually doing significant work can dominate the cost of the overall job.

To avoid this, the main condition for executing the body of the procedure can be pulled out to the caller, or to preserve encapsulation and ensure integrity of the logic, it can simply be repeated. In the case above, the condition can get encapsulated as such:

1 CREATE FUNCTION call_precond(a INTEGER)
2 RETURNS INTEGER
3 RETURN CASE WHEN a < 0 AND a < -10 AND (a/2) * 2 = a 
4        THEN 1 END

Find the result below:

CALL avoidance tuned

Note that the tuned procedure does not fail if the precondition is not checked outside, since the check is repeated in the body of the procedure.

This example concludes the list of tuning tips in this article. I will finish up the article with a short list of situations that are tuned by the new SQL PL Virtual Machine (PVM) in DB2 V8.2.


SQL PL tuning in the PVM

Gustavo Arocena, the architect of SQL Procedure, published SQL Procedures Performance: Hints and Tips in 2003. Since then, much has changed. Instead of cross-compiling SQL Procedures to C with embedded SQL, the DB2 V8.2 natively compiles and executes SQL Procedures using its PVM. In the process, some of the hints Arocena describes have found their way into the fledgling PVM optimizer. As you may have seen in the screenshots of the previous examples, some statements have no costs assigned to them. This is not because they are free, but because the statements either get combined with other statements in the PVM optimizer, or because the PVM can execute the SQL natively. In DB2 V8.2 PVM can, among other things, do the following:

  • Perform most straight variable assignments
  • Do very basic math, such as loop increments
  • Combine chains of SET statements into a single VALUES INTO statement
  • Rewrite CASE and IF THEN ELSE statements with embedded SET statements into CASE expressions
  • Evaluate basic conditions such as “a<b”
  • Pre-compute CASE condition and IF THEN ELSE conditions and jump into the right branch right away

The purpose of these optimizations is to avoid invocation of the regular SQL runtime interpreter wherever possible to shorten the code path. PVM is still young, but it in general outperforms SQL Procedures compiled prior to V8.2 primarily because of the improvements shown above.


Conclusion

In this article, I have provided tuning tips for common SQL procedure performance problems that I have seen in the field. Often DBAs and developers look towards database configuration, hardware, and the schema design to solve performance bottlenecks. While all of these are good and valid approaches, the perfectly tuned database still cannot perform if the logic does not allow for it.

With techniques such as the ones described above, improvements of up to 30 times have been achieved in real customer scenarios, which can make the difference between a failed project and a proud success.


Acknowledgements

Thanks to Lee Johnson, and Gustavo Arocena, the brains behind PVM.

Also thanks to the countless developers, both the IBM employees and customers, who share their SQL challenges with me. I learn a lot from you.



Download

DescriptionNameSizeDownload method
source code described in articletuning.zip5 KB FTP | HTTP

Information about download methods


Resources

About the author

Serge Rielau

Serge Rielau is a Senior Software Developer, working on DB2 UDB SQL compiler development at the IBM Toronto lab. He has worked in DB2’s SQL Compiler development for 7 years and is an expert in the SQL Language. Serge spends a good deal of his time helping customers migrate from other DBMS to DB2 and he is an active participant in the comp.databases.ibm-db2 newsgroup.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=50923
ArticleTitle=Tune SQL procedures
publish-date=01272005
author1-email=srielau@ca.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers