Skip to main content

High-performance SQL

A SQL tour through the DB2 TPC-C benchmark implementation

Serge Rielau (srielau@ca.ibm.com), Senior Software Developer, IBM, Software Group
Serge Rielau is a Senior Software Developer, working on DB2 UDB SQL compiler development at the IBM Toronto lab. He has worked in DB2’s SQL compiler development team for 7 years and is an expert in the SQL Language. Among other projects he has developed inline SQL PL and SQL Functions. He also designed what is commonly known as "SELECT FROM INSERT, UPDATE and DELETE". Nowadays Serge spends a good deal of his time helping customers migrate from other DBMS to DB2 and he is an active participant in the comp.databases.ibm-db2 newsgroup.

Summary:  This article introduces advanced SQL features which are being used in IBM DB2® Universal Database™ (UDB) for Linux™, UNIX®, and Windows® (DB2 UDB for LUW) TPC-C benchmarks. Find out how to use these features in everyday customer environments.

Date:  18 Nov 2004
Level:  Intermediate
Activity:  759 views
Comments:  

Introduction

The past year and a half have seen a lot of DB2 for LUW activity on the standard TPC-C benchmark, the benchmark commonly used to measure performance in the on line transaction processing environment. The results range from the very small numbers on very reasonable machines to very, big numbers on fantastic setups that you are unlikely to see at home or at work.

While the number of rows populating the database schema may have been adjusted to scale, one area remains the same: SQL. In this article, you walk through the logic of the TPC-C benchmark, gain an understanding of the SQL technology, and discover how it can be used in real customer environments.

To accomplish this, author Serge Rielau, has organized the article as follows: First by introducing the benchmark’s DB schema and its transactions. Next, he will analyze each transaction and explain the SQL features of each. Overall, at the end of this article, you should have a better understanding of the TPC-C benchmark as well as DB2’s advanced SQL features, including the philosophy behind them.


TPC-C: A quick overview of the benchmark

The TPC-C benchmark simulates a distribution enterprise with a product sales cycle using a small set of transactions and tables. The external schema and the semantics of each transaction are formally specified. For the purpose of getting a quick overview and to provide a base schema to work on, the following is a quick summary. Tables are marked in bold and transactions are shown in italics:

The database used for the benchmark consists primarily of warehouses, districts, items and customers. 100,000 ITEMs are in STOCK at each WAREHOUSE. Each warehouse serves 10 DISTRICTs. Each district serves 3000 CUSTOMERs which ORDER new items. Each ORDER can be composed of up to 15 distinct items called ORDER_LINE. Before an order is delivered, it is queued as a NEW_ORDER, and the stock for each item is adjusted for the warehouse.. Upon DELIVERY, the customer is charged. Upon PAYMENT, the transaction is archived in the HISTORY, and the revenue is booked.

In addition, customers can inquire on their ORDER STATUS, and warehouse managers can inquire the STOCK LEVEL of items on order.

To make things a bit more realistic, customers tend to forget their customer ids, so you need to be able to look them up by their last names, of which there may be duplicates. Also, not all orders can be satisfied from the local warehouse, requiring items to be shipped in from remote warehouses.

The number of new orders that can be handled by a warehouse is constrained to between 9 and 12.86 transactions per minute. This means that to achieve a result of over 3.2 Million TpmC (NEW ORDER Transactions per minute TPC-C), the number of warehouses needs to be dialed up to more than 256,000. This implies 7.7 Billion registered customers. So, don’t try this unless you have plenty of disks and a big fuse!

Before you dive into the five transactions, lhere is the DDL for the tables. It is pretty self-explanatory.


Listing 1. DDL to create the TPC-C database tables
			
  1 CREATE TABLE WAREHOUSE
  2      (
  3        W_NAME      CHAR(10)   NOT NULL,
  4        W_STREET_1  CHAR(20)   NOT NULL,
  5        W_STREET_2  CHAR(20)   NOT NULL,
  6        W_CITY      CHAR(20)   NOT NULL,
  7        W_STATE     CHAR(2)    NOT NULL,
  8        W_ZIP       CHAR(9)    NOT NULL,
  9        W_TAX       INTEGER    NOT NULL,
 10        W_YTD       BIGINT     NOT NULL,
 11        W_ID        INTEGER    NOT NULL,
 12        PRIMARY KEY (W_ID)
 13      ); 
 14
 15 CREATE TABLE DISTRICT
 16      (
 17        D_NEXT_O_ID INTEGER         NOT NULL,
 18        D_TAX       INTEGER         NOT NULL,
 19        D_YTD       BIGINT          NOT NULL,
 20        D_NAME      CHAR(10)        NOT NULL,
 21        D_STREET_1  CHAR(20)        NOT NULL,
 22        D_STREET_2  CHAR(20)        NOT NULL,
 23        D_CITY      CHAR(20)        NOT NULL,
 24        D_STATE     CHAR(2)         NOT NULL,
 25        D_ZIP       CHAR(9)         NOT NULL,
 26        D_ID        SMALLINT        NOT NULL,
 27        D_W_ID      INTEGER         NOT NULL,
 28        PRIMARY KEY (D_ID, D_W_ID)
 29      );
 30
 31 CREATE TABLE ITEM
 32      (
 33        I_NAME          CHAR(24)    NOT NULL,
 34        I_PRICE         INTEGER     NOT NULL,
 35        I_DATA          VARCHAR(50) NOT NULL,
 36        I_IM_ID         INTEGER     NOT NULL,
 37        I_ID            INTEGER     NOT NULL,
 38        PRIMARY KEY (I_ID)
 39      );
 40
 41 CREATE TABLE STOCK
 42      (
 43        S_REMOTE_CNT    INTEGER     NOT NULL,
 44        S_QUANTITY      INTEGER     NOT NULL,
 45        S_ORDER_CNT     INTEGER     NOT NULL,
 46        S_YTD           INTEGER     NOT NULL,
 47        S_DATA          VARCHAR(50) NOT NULL,
 48        S_DIST_01       CHAR(24)    NOT NULL,
 49        S_DIST_02       CHAR(24)    NOT NULL,
 50        S_DIST_03       CHAR(24)    NOT NULL,
 51        S_DIST_04       CHAR(24)    NOT NULL,
 52        S_DIST_05       CHAR(24)    NOT NULL,
 53        S_DIST_06       CHAR(24)    NOT NULL,
 54        S_DIST_07       CHAR(24)    NOT NULL,
 55        S_DIST_08       CHAR(24)    NOT NULL,
 56        S_DIST_09       CHAR(24)    NOT NULL,
 57        S_DIST_10       CHAR(24)    NOT NULL,
 58        S_I_ID          INTEGER     NOT NULL,
 59        S_W_ID          INTEGER     NOT NULL,
 60        PRIMARY KEY (S_I_ID, S_W_ID)
 61      );
 62
 63 CREATE TABLE CUSTOMER
 64      (
 65        C_ID            INTEGER       NOT NULL,
 66        C_STATE         CHAR(2)       NOT NULL,
 67        C_ZIP           CHAR(9)       NOT NULL,
 68        C_PHONE         CHAR(16)      NOT NULL,
 69        C_SINCE         BIGINT        NOT NULL,
 70        C_CREDIT_LIM    BIGINT        NOT NULL,
 71        C_MIDDLE        CHAR(2)       NOT NULL,
 72        C_CREDIT        CHAR(2)       NOT NULL,
 73        C_DISCOUNT      INTEGER       NOT NULL,
 74        C_DATA          VARCHAR(500)  NOT NULL,
 75        C_LAST          VARCHAR(16)   NOT NULL,
 76        C_FIRST         VARCHAR(16)   NOT NULL,
 77        C_STREET_1      VARCHAR(20)   NOT NULL,
 78        C_STREET_2      VARCHAR(20)   NOT NULL,
 79        C_CITY          VARCHAR(20)   NOT NULL,
 80        C_D_ID          SMALLINT      NOT NULL,
 81        C_W_ID          INTEGER       NOT NULL,
 82        C_DELIVERY_CNT  INTEGER       NOT NULL,
 83        C_BALANCE       BIGINT        NOT NULL,
 84        C_YTD_PAYMENT   BIGINT        NOT NULL,
 85        C_PAYMENT_CNT   INTEGER       NOT NULL,
 86        PRIMARY KEY (C_ID, C_D_ID, C_W_ID)
 87      );
 88 CREATE INDEX CUST_IDXB 
 89   ON CUSTOMER (C_LAST, C_W_ID, C_D_ID, C_FIRST, C_ID);
 90
 91 CREATE TABLE HISTORY
 92      (
 93        H_C_ID          INTEGER     NOT NULL,
 94        H_C_D_ID        SMALLINT    NOT NULL,
 95        H_C_W_ID        INTEGER     NOT NULL,
 96        H_D_ID          SMALLINT    NOT NULL,
 97        H_W_ID          INTEGER     NOT NULL,
 98        H_DATE          BIGINT      NOT NULL,
 99        H_AMOUNT        INTEGER     NOT NULL,
100        H_DATA          CHAR(24)    NOT NULL
101      );
102
103 CREATE TABLE ORDERS
104      (
105        O_C_ID          INTEGER     NOT NULL,
106        O_ENTRY_D       BIGINT      NOT NULL,
107        O_CARRIER_ID    SMALLINT    NOT NULL,
108        O_OL_CNT        SMALLINT    NOT NULL,
109        O_ALL_LOCAL     SMALLINT    NOT NULL,
110        O_ID            INTEGER     NOT NULL,
111        O_W_ID          INTEGER     NOT NULL,
112        O_D_ID          SMALLINT    NOT NULL,
113        PRIMARY KEY (O_ID, O_W_ID, O_D_ID)
114      );
115 CREATE INDEX ORDR_IDXB 
116      ON ORDERS (O_C_ID, O_W_ID, O_D_ID, O_ID DESC);
117
118 CREATE TABLE ORDER_LINE
119      (
120        OL_DELIVERY_D    BIGINT     NOT NULL,
121        OL_AMOUNT        INTEGER    NOT NULL,
122        OL_I_ID          INTEGER    NOT NULL,
123        OL_SUPPLY_W_ID   INTEGER    NOT NULL,
124        OL_QUANTITY      SMALLINT   NOT NULL,
125        OL_DIST_INFO     CHAR(24)   NOT NULL,
126        OL_O_ID          INTEGER    NOT NULL,
127        OL_D_ID          SMALLINT   NOT NULL,
128        OL_W_ID          INTEGER    NOT NULL,
129        OL_NUMBER        SMALLINT   NOT NULL,
130        PRIMARY KEY (OL_O_ID, OL_W_ID, OL_D_ID, OL_NUMBER)
131      );
132
133 CREATE TABLE NEW_ORDER
134      (
135        NO_O_ID         INTEGER     NOT NULL,
136        NO_D_ID         SMALLINT    NOT NULL,
137        NO_W_ID         INTEGER     NOT NULL,
138        PRIMARY KEY (NO_W_ID, NO_D_ID, NO_O_ID)
139      );             


NEW ORDER transaction (local)

The NEW ORDER transaction processes a new order from a customer. Various things need to be done here:

  1. Retrieve the next order id and the district tax rate for the customers district.
  2. Increment the next order id for the customers district for subsequent transactions.
  3. For each item the customer orders:
    1. Retrieve the item's name, price and description from the ITEM table.
    2. Retrieve some district information and the remaining stock level for the item from STOCK.
    3. Decrease the stock level of the item by the ordered quantity. If the stock falls below a threshold, order in supplies (by simply incrementing the value).
    4. Insert the relevant retrieved data including the overall price into ORDER_LINE.
  4. Insert the order into the ORDERS and NEW_ORDER tables.
  5. Retrieve the customer name, discount, credit information from CUSTOMER.
  6. Retrieve sales tax information from WAREHOUSE.
  7. Compute the total price considering the discount and the taxes.

That’s quite a bit of work. However DB2 manages to get it all done using 3 SQL statements. This is how:

First, DB2 deals with the DISTRICT table. Data needs to be returned and an update needs to be performed. Conventional wisdom states that this requires 2 SQL statements, and that the UPDATE ought to be done prior to the SELECT; otherwise deadlocks may occur as concurrency increases.

DB2 however supports a new SQL feature which is in the process of being standardized. This feature allows access to what is known as transition tables in triggers. The OLD TABLE transition table holds the original state of the affected rows before they are processed by the UPDATE or DELETE statement. The NEW TABLE transition table holds the affected rows immediately after an INSERT or UPDATE was processed. That is the state prior to when AFTER triggers fire. Users with a Microsoft or Sybase background may know these tables by the names DELETED and INSERTED.

All that DB2 does is to allow an UPDATE, DELETE, and INSERT to appear in the FROM clause of a select and allow the user to chose which transition table shall be selected from:


Listing 2. Using transition tables
			
1  SELECT D_TAX, D_NEXT_O_ID  
2     INTO :dist_tax , :next_o_id
3     FROM OLD TABLE ( UPDATE DISTRICT
4                       SET  D_NEXT_O_ID = D_NEXT_O_ID + 1
5                       WHERE D_W_ID = :w_id 
6                         AND D_ID = :d_id 
7                     ) AS OT

The advantage of this logic can be easily seen by taking a look the optimizer plan, shown in Listing 3:


Listing 3. Access plan using transition tables
			
                    Rows 
                   RETURN 
                   (   1) 
                    Cost 
                     I/O 
                     |
                      1 
                   UPDATE 
                   (   2) 
                   25.7261 
                      2 
                  /---+--\
                1          26 
             FETCH   TABLE: SRIELAU  
             (   3)     DISTRICT 
             12.872 
                1 
           /----+---\
         1            26 
      IXSCAN    TABLE: SRIELAU  
      (   4)       DISTRICT 
     0.0175755 
         0 
        |
        26 
  INDEX: SYSIBM   
 SQL0410231029415

The structure of the combined plan is nearly identical to that of the UPDATE statement alone. The TPC-C specification mandates the storage of the next order id in the DISTRICT table. In a customer environment one could easily use a SEQUENCE instead to avoid locking altogether.

Now take a look at the second of the three SQL Statements:


Listing 4. Second SQL statement
			

 1  WITH DATA AS (  SELECT  O_ID , D_ID , W_ID , OL_NUMBER , I_ID
 2                        , W_ID AS I_SUPPLY_W_ID
 3                        , 0 AS OL_DELIVERY_D
 4                        , I_QTY
 5                        , ( I_PRICE * I_QTY ) AS TOTAL_PRICE
 6                        , OL_DIST_INFO , I_PRICE, I_NAME, I_DATA, S_DATA
 7                        , S_QUANTITY
 8                     FROM ( SELECT   :next_o_id as O_ID
 9                                   , :w_id AS W_ID
10                                   , :d_id as D_ID
11                                   , OL_NUMBER  , I_ID , I_QTY
12                              FROM  TABLE( VALUES
13                                                (  1 , :id0  , :ol_quantity0  )
14                                              , (  2 , :id1  , :ol_quantity1  )
15                                              , (  3 , :id2  , :ol_quantity2  )
16                                              , (  4 , :id3  , :ol_quantity3  )
17                                              , (  5 , :id4  , :ol_quantity4  )
18                                              , (  6 , :id5  , :ol_quantity5  )
19                                              , (  7 , :id6  , :ol_quantity6  )
20                                              , (  8 , :id7  , :ol_quantity7  )
21                                              , (  9 , :id8  , :ol_quantity8  )
22                                              , ( 10 , :id9  , :ol_quantity9  )
23                                              , ( 11 , :id10 , :ol_quantity10 )
24                                              , ( 12 , :id11 , :ol_quantity11 )
25                                              , ( 13 , :id12 , :ol_quantity12 )
26                                              , ( 14 , :id13 , :ol_quantity13 )
27                                              , ( 15 , :id14 , :ol_quantity14 )
28                                         ) AS X ( OL_NUMBER , I_ID  , I_QTY )
29                          ) AS ITEMLIST
30                        , TABLE( NEW_OL_LOCAL( I_ID , I_QTY , W_ID
31                                             , O_ID , D_ID  , SMALLINT(OL_NUMBER)
32                                             )
33                               ) AS NEW_OL_LOCAL
34                     WHERE NEW_OL_LOCAL.I_PRICE IS NOT NULL
33               )
36  SELECT I_PRICE , I_NAME , I_DATA , OL_DIST_INFO , S_DATA , S_QUANTITY
37    FROM NEW TABLE ( INSERT INTO ORDER_LINE
38                       (   OL_O_ID       , OL_D_ID     , OL_W_ID      
39                         , OL_NUMBER     , OL_I_ID     , OL_SUPPLY_W_ID 
40                         , OL_DELIVERY_D , OL_QUANTITY , OL_AMOUNT 
41                         , OL_DIST_INFO
42                       )
43                       INCLUDE (   I_PRICE    INTEGER
44                                 , I_NAME     CHAR(24)
45                                 , I_DATA     VARCHAR(50)
46                                 , S_DATA     VARCHAR(50)
47                                 , S_QUANTITY SMALLINT )
48                        SELECT   O_ID          , D_ID    , W_ID           
49                               , OL_NUMBER     , I_ID    , I_SUPPLY_W_ID 
60                               , OL_DELIVERY_D , I_QTY   , TOTAL_PRICE 
61                               , OL_DIST_INFO  , I_PRICE , I_NAME
62                               , I_DATA        , S_DATA  , S_QUANTITY
63                        FROM DATA
64                    ) AS INS

Now, that is a hefty SQL statement. Next analyze the query step by step:

  1. The ITEMLIST is composed from the input arguments. It consists of items and quantities for each item colected in a VALUES clause. The order id, district and warehouse are constant for all items in the list.
  2. Using a correlated join, the NEW_OL_LOCAL SQL Table Function processes each item in the list separately. As you will see shortly, the function returns the item price, name, some meta data, and stock information.
  3. Items that could not be found (their price is NULL) can be filtered out. Call the resulting table DATA.
  4. Insert each item from DATA into ORDER_LINE.
  5. Return the item price, name, meta data and stock information for each item to the user.

There are a couple of interesting things happening here.

First, the NEW_OL _LOCAL table function really returns only one row at a time. With the correlation it operates more like a user-defined relational operator.

Second, the NEW_OL _LOCAL table function, as you will see later, actually writes to a table. What you see here is a join in which the inner performs database modifications. To allow this, and keep the database in a consistent state, DB2 needs to either convince itself that the outer leg of the join and its inner cannot collide; or DB2 needs to throw in a temporary table, called a dam, to ensure the outer leg of the join is completely derived before the table function can start processing. Since the SQL Table Function is written in inline SQL PL, DB2's macro language for SQL PL, DB2 can see through the function and conclude that the transaction is well behaved, and no dam is necessary in this case.

Having mentioned possible collisions between the outer and the inner legs of a join, it should also be mentioned that a similar problem could occur between the function and the INSERT operation. To avoid too much complexity, DB2 imposes a simple rule: An UPDATE, DELETE or INSERT (data-change-operation) shall never happen in a join unless embedded in a correlated function. Further a data-change-operation shall never occur in a nested query.

Instead data-change-operations shall be confined to the top level SELECT or the topmost SELECT of a common table expression (CTE, also known as the "WITH clause"). This is what is happening in this case: DATA performs a data change operation and has been banned as a CTE into the WITH clause. It is now allowed to feed the INSERT operation.

What is the reason for these rules? Keeping data change operations in the WITH clause imposes a very natural order which can be used to solve conflicts.

Finally, there is one more novel feature related to data-change-operations exposed in this transaction: the INCLUDE-clause which INSERT, like any other data-change-operation, can use. This clause allows you to extend the OLD TABLE or NEW TABLE transition tables by appending extra columns. These columns are simply hitching a ride along the INSERT operation and are accessible to the SELECT above. In this case the item price, name, data and the stock-related data are not part of the ORDER_LINE table. These columns are simply passed through to be returned to the user.

Why all this commotion? Why not select twice from DATA - once for the INSERT, once for the final SELECT? The answer is simple: Just like SORT, TEMP is a "four-letter-word".

Your tour through the NEW ORDER transaction would not be complete without a quick glance of the NEW_OL_LOCAL table function:


Listing 5. NEW_OL_LOCAL table function
			
 1 CREATE FUNCTION NEW_OL_LOCAL(   I_ID      INTEGER
 2                               , I_QTY     SMALLINT
 3                               , W_ID      INTEGER
 4                               , O_ID      INTEGER
 5                               , D_ID      SMALLINT
 6                               , OL_NUMBER SMALLINT
 7                             )
 8 RETURNS TABLE(   I_PRICE       INTEGER
 9                , I_NAME        CHAR(24)
 0                , I_DATA        VARCHAR(50)
11                , OL_DIST_INFO  CHAR(24)
12                , S_DATA        VARCHAR(50)
13                , S_QUANTITY    SMALLINT
14              )
15 SPECIFIC NEW_OL_LOCAL
16 MODIFIES SQL DATA DETERMINISTIC NO EXTERNAL ACTION LANGUAGE SQL
17 
18 VAR: BEGIN ATOMIC
19   DECLARE I_PRICE       INTEGER      ;
20   DECLARE I_NAME        CHAR(24)     ;
21   DECLARE I_DATA        VARCHAR(50)  ;
22   DECLARE OL_DIST_INFO  CHAR(24)     ;
23   DECLARE S_DATA        VARCHAR(50)  ;
24   DECLARE S_QUANTITY    SMALLINT     ;
25
26   SET  ( I_PRICE , I_NAME , I_DATA )
27     =  ( SELECT 
28                   I_PRICE
29                 , I_NAME
30                 , I_DATA
31 
32            FROM ITEM
33           WHERE ITEM.I_ID = NEW_OL_LOCAL.I_ID
34       )
35     ;
36     SET ( OL_DIST_INFO , S_DATA , S_QUANTITY )
37       = ( SELECT   OL_DIST_INFO 
38                  , S_DATA 
39                 , S_QUANTITY
40             FROM NEW TABLE ( UPDATE STOCK
41                                INCLUDE (  OL_DIST_INFO CHAR( 24 )  ) 
42                                SET S_QUANTITY = CASE WHEN S_QUANTITY - NEW_OL_LOCAL.I_QTY >= 10
43                                                      THEN S_QUANTITY - NEW_OL_LOCAL.I_QTY
44                                                      ELSE S_QUANTITY - NEW_OL_LOCAL.I_QTY + 91
45                                                 END
46                                  , S_ORDER_CNT  = S_ORDER_CNT + 1
47                                  , S_YTD        = S_YTD + NEW_OL_LOCAL.I_QTY
48                                  , OL_DIST_INFO = CASE D_ID WHEN  1 THEN S_DIST_01
49                                                             WHEN  2 THEN S_DIST_02
50                                                             WHEN  3 THEN S_DIST_03
51                                                             WHEN  4 THEN S_DIST_04
52                                                             WHEN  5 THEN S_DIST_05
53                                                             WHEN  6 THEN S_DIST_06
54                                                             WHEN  7 THEN S_DIST_07
55                                                             WHEN  8 THEN S_DIST_08
56                                                             WHEN  9 THEN S_DIST_09
57                                                             WHEN 10 THEN S_DIST_10
58                                                   END
59                                 WHERE S_I_ID = NEW_OL_LOCAL.I_ID
60                                   AND S_W_ID = NEW_OL_LOCAL.W_ID  
61                           ) AS U
62        )
63    ;
64    RETURN VALUES(   VAR.I_PRICE
65                   , VAR.I_NAME
66                   , VAR.I_DATA
67                   , VAR.OL_DIST_INFO
68                   , VAR.S_DATA
69                   , VAR.S_QUANTITY
70                 )
71    ;
72 END
			

This function implements step 2 of the steps listed above. The function retrieves the item information for an order line and performs the necessary stock updates. Note that the function is defined as MODIFIES SQL DATA. This clause allows an SQL Table function to contain UPDATE, DELETE, INSERT and MERGE statements. Also note that, again, the INCLUDE clause has been used, this time to pass the OL_DIST_INFO column up. This column is derived from the specific district information of the stocked item. Finally, note how the single row VALUES clause is used in the RETURN statement to return the end result as a single-row table

In order to realize performance gains from this complicated SQL statement DB2 selects a very sophisticated query plan:


Listing 6. Access plan for second SQL statement
			
                          Rows 
                         RETURN 
                         (   1) 
                          Cost 
                           I/O 
                           |
                          14.4 
                         TBSCAN 
                         (   2) 
                         26.2997 
                         2.02765 
                           |
                          14.4 
                         SORT   
                         (   3) 
                         26.2978 
                         2.02765 
                           |
                          14.4 
                         INSERT 
                         (   4) 
                         26.2922 
                         2.02765 
                        /---+---\
                    14.4          44 
                   FILTER   TABLE: SRIELAU  
                   (   5)     ORDER_LINE 
                   13.4359 
                   1.02765 
                     |
                     15 
                   NLJOIN 
                   (   6) 
                   13.4334 
                   1.02765 
           /----------+---------\
        15                         1 
      TBSCAN                    NLJOIN 
      (   7)                    (   8) 
    0.000201927                 13.2461 
         0                      1.02765 
        |               /----------+----------\
        15            1                          1 
 TABFNC: SYSIBM    NLJOIN                     TBSCAN 
      GENROW       (   9)                     (  18) 
                   13.246                  4.48727e-005 
                   1.02765                       0 
           /----------+---------\               |
         1                         1             1 
      TBSCAN                    UNION    TABFNC: SYSIBM   
      (  10)                    (  11)        GENROW 
   4.48727e-005                 13.2457 
         0                      1.02765 
        |               /----------+----------\
         1          0.96                         1 
 TABFNC: SYSIBM    FETCH                      UPDATE 
      GENROW       (  13)                     (  15) 
                   12.8727                   0.371624 
                      1                      0.027648 
                 /----+---\                  /---+---\
               1            36         0.013824         9 
            IXSCAN    TABLE: SRIELAU    FETCH    TABLE: SRIELAU  
            (  14)         ITEM         (  16)        STOCK 
           0.0182935                   0.193765 
               0                       0.013824 
              |                       /----+---\
              36                0.013824          9 
        INDEX: SYSIBM            IXSCAN    TABLE: SRIELAU  
       SQL0410231029418          (  17)         STOCK 
                                0.0157303 
                                    0 
                                   |
                                    9 
                             INDEX: SYSIBM   
                            SQL0410231029421 

Nice plan - but what does it have to do with the query above? Do a who-is-who in this plan:

  • FETCH(13) represents the first SET statement in the function, selecting from the ITEM table.
  • UPDATE(15) obviously is the UPDATE of the STOCK table incuding the SET statement driving it.
  • UNION(11) is not entirely what it claims to be. In the context of inline SQL PL UNIONs are used to drive statements in a sequential fashion. So this UNION is best equated with the function body BEGIN ATOMIC...END.
  • TBSCAN(10) initializes the local variables in the function. This is the chain of DECLARE statements. FETCH(13) and UPDATE(15) actually read and assign values to the outer of NLJOIN(9) overriding the defaults for the local variables
  • TBSCAN(18) represents the RETURN statement in the SQL Table function.
  • Looking at NLJOIN(8), which represents the entire function you see a three way ordered join:
    1. Initialize local variables
    2. Execute the body of the function
    3. Return the result table
  • TBSCAN(7) as the outer of the join provides the list of items to the function.
  • FILTER(5) throws out the bad items with I_PRICE IS NULL
  • INSERT(4) is obviously the INSERT into ORDER_LINE, you should be done, but....
  • SORT(3) is special. Since this query is a cursor, DB2 is facing a dilemma. The query changes the state of the database by updating both STOCK and ORDER_LINE. However there is no guarantee that the user will actually fetch all the rows of the table and hence drive these changes through. So DB2 needs to dam the result set to ensure that the whole query is finished when the cursor is opened. For performance reasons, a SORT without sort keys is chosen for this dam over a regular temp table. These do-at-open semantics actually have a very nice side-effect. Using a cursor with hold, one can perform complex database operations and commit them right after the open. Then the result set can be read and analyzed without holding any locks.

For those truly interested in the internal workings of inline SQL PL, more details can be found in the patents referenced in the resources section of this article.

The third and final statement in the NEW_ORDER transaction is:


Listing 7. Third SQL statement
			
 1 SELECT W_TAX, C_DISCOUNT, C_LAST, C_CREDIT 
 2   INTO :ware_tax, :c_discount, :c_last, :c_credit
 3   FROM TABLE ( NEW_WH (   :next_o_id
 4                         , :w_id 
 5                         , :d_id 
 6                         , :c_id 
 7                         , :o_entry_d 
 8                         , :inputItemCount 
 9                         , :allLocal 
10                       )
11               ) AS NEW_WH_TABLE 


Listing 8. NEW_WH function
			
 1 CREATE FUNCTION NEW_WH (   O_ID        INTEGER
 2                          , W_ID        INTEGER
 3                          , D_ID        SMALLINT
 4                          , C_ID        INTEGER
 5                          , O_ENTRY_D   BIGINT
 6                          , O_OL_CNT    SMALLINT
 7                          , O_ALL_LOCAL SMALLINT
 8                        )
 9 RETURNS TABLE (   W_TAX       INTEGER
10                 , C_DISCOUNT  INTEGER
11                 , C_LAST      VARCHAR(16)
12                 , C_CREDIT    CHAR(2)
13               )
14 SPECIFIC NEW_WH
15 MODIFIES SQL DATA DETERMINISTIC NO EXTERNAL ACTION LANGUAGE SQL
16 VAR: BEGIN ATOMIC
17   DECLARE C_DISCOUNT  INTEGER ;
18   DECLARE C_LAST      VARCHAR(16) ;
19   DECLARE C_CREDIT    CHAR(2) ;
20   DECLARE W_TAX       INTEGER ;
21
22   INSERT 
23     INTO NEW_ORDER ( NO_O_ID, NO_D_ID, NO_W_ID )
24             VALUES (   O_ID
25                      , D_ID
26                      , W_ID
27                    ) 
28   ;
29   INSERT 
30     INTO ORDERS (   O_C_ID      , O_ENTRY_D , O_CARRIER_ID , O_OL_CNT
31                   , O_ALL_LOCAL , O_ID      , O_W_ID       , O_D_ID   )
32          VALUES (   C_ID        , O_ENTRY_D , 0    , O_OL_CNT
33                   , O_ALL_LOCAL , O_ID      , W_ID , D_ID             ) 
34   ;
35   SET ( C_DISCOUNT, C_LAST, C_CREDIT ) 
36    =  ( SELECT  C_DISCOUNT, C_LAST, C_CREDIT 
37           FROM  CUSTOMER 
38          WHERE  C_ID = NEW_WH.C_ID 
39            AND  C_W_ID = W_ID
40            AND  C_D_ID = D_ID
41       )
42   ;
43   SET W_TAX 
44     =    ( SELECT W_TAX
45              FROM WAREHOUSE
46             WHERE W_ID = NEW_WH.W_ID 
47          ) 
48   ;
49   RETURN VALUES ( W_TAX , C_DISCOUNT , C_LAST , C_CREDIT ) ;
50 END
			

As you can see the statements inside the NEW_WH function are independent of each other. So why bother encapsulating them into a function? The reason to merge these four statements together is to minimize the codepath associated with the invocation of an SQL statement. Within a regular customer environment the ability to have encapsulation without harming performance is goodness. DB2 proves here that good coding style and benchmarking don't have to be at odds with each other.

The corresponding plan is as follows:


Listing 9. Access plan for third SQL statement
access plan for 3rd SQL statement

You can see here how the UNION is driving the four statements. Also note that there is no SORT at the top because the statement is a SELECT INTO rather than a cursor.

The version of the NEW_ORDER transaction which deals with remote warehouses will not be covered. It has a very similar structure, so there is nothing new to learn.


DELIVERY transaction

DELIVERY is a straightforward transaction:

  1. The carrier finds the oldest order id on the NEW_ORDER queue.
  2. The order is deleted from the queue.
  3. The customer id is retrieved from the order stored in ORDER.
  4. The amount owing is computed from the ORDER_LINEs. Note that in truly normalized fashion the grand total was never stored in the order itself.
  5. Each order line is marked as delivered by setting the delivery day.
  6. The order is marked as delivered by setting the delivery day.
  7. The customer's balance and the delivery count are updated.
  8. The order id is returned.

If the queue is empty, nothing happens and NULL is returned.


Listing 10. Delivery table function
			
 1 CREATE FUNCTION DEL(   W_ID           INTEGER 
 2                      , D_ID           SMALLINT
 3                      , CARRIER_ID     SMALLINT
 4                      , DELIVERY_D     BIGINT
 5                    )
 6 RETURNS TABLE ( O_ID INTEGER )
 7 SPECIFIC DELIVERY
 8 MODIFIES SQL DATA DETERMINISTIC NO EXTERNAL ACTION LANGUAGE SQL
 9 VAR: BEGIN ATOMIC
10   DECLARE O_ID   INTEGER ;
11   DECLARE C_ID   INTEGER ;
12   DECLARE AMOUNT INTEGER ;
13
14   /* Delete the order from new order table */
15   SET VAR.O_ID = ( SELECT NO_O_ID
16                      FROM OLD TABLE ( DELETE
17                                         FROM ( SELECT NO_O_ID
18                                                  FROM NEW_ORDER
19                                                 WHERE NO_W_ID = DEL.W_ID
20                                                   AND NO_D_ID = DEL.D_ID
21                                                 ORDER BY NO_O_ID ASC
22                                                 FETCH FIRST 1 ROW ONLY
23                                              ) AS NEW_ORDER
24                                     ) AS D
25                  )
26   ;
27   /* Update the order as delivered and retrieve the customer id */
28   SET VAR.C_ID = ( SELECT O_C_ID
29                      FROM OLD TABLE ( UPDATE ORDERS
30                                          SET O_CARRIER_ID = DEL.CARRIER_ID
31                                        WHERE O_W_ID = DEL.W_ID
32                                          AND O_D_ID = DEL.D_ID
33                                          AND O_ID   = VAR.O_ID
34                                     ) AS U
35                  ) 
36   ;
37   SET VAR.AMOUNT = ( SELECT SUM( OL_AMOUNT )
38                        FROM OLD TABLE ( UPDATE ORDER_LINE
39                                            SET OL_DELIVERY_D = DEL.DELIVERY_D
40                                          WHERE OL_W_ID = DEL.W_ID
41                                            AND OL_D_ID = DEL.D_ID
42                                            AND OL_O_ID = VAR.O_ID
43                                       ) AS U
44                    ) 
45   ;
46   /* Charge the customer */
47   UPDATE CUSTOMER
48      SET   C_BALANCE      = C_BALANCE + VAR.AMOUNT
49          , C_DELIVERY_CNT = C_DELIVERY_CNT + 1
50    WHERE C_W_ID = DEL.W_ID
51      AND C_D_ID = DEL.D_ID
52      AND C_ID   = VAR.C_ID  
53   ;
54   /* Return the order id to the caller (or NULL) */
55   RETURN VALUES VAR.O_ID ;
56 END
			

As in the NEW_ORDER transaction, all steps have been collected into one SQL table function to save code path. Instead of the eight steps, only five substatements are present. What happened?

  1. The traditional way to process an element on a queue in SQL is to fetch the element in a first step. Then, in a second step, the element is deleted from the table. The main problem with this approach is that, unless a cursor-for-update is opened, a straight select into will not lock the row sufficiently to prevent another carrier from attempting to deliver the same order. Also in either case, cursor or not, two statements need to be executed. Again the ability to push data-change-operations (in this case DELETE) into the FROM clause comes in handy. However, more than this is needed. In order to find the "oldest" order you must execute a MIN() function. This function cannot appear in a DELETE statement's WHERE clause. It is only allowed in queries. The fact that DB2's SQL language is highly orthogonal helps solve the problem in an elegant way. First, DB2 allows DELETE against queries with the same semantics as it allows DELETE against a view (which really are queries). Second, DB2 allows DELETE against an ordered query. Knowing that "ORDER BY O_ID FETCH FIRST ROW ONLY" will serve up the oldest order id results in the rather elegant DELETE operation above.
  2. Retrieving the customer id and updating the order has been folded into one statement.
  3. Computing the total value of the order from the individual order lines has been folded together with the update of the delivery column in ORDER_LINES. Again this works only because of the orthoginality of DB2's choice for the SQL language. Because the UPDATE is in the FROM clause it is easy to use the standard SUM() to aggregate the end result.

The actual statement to execute the delivery is rather trivial:


Listing 11. SQL statement to execute delivery
			
1 SELECT O_ID 
2   INTO :no_o_id :no_o_id_indicator
3   FROM TABLE (  DEL( :w_id , :d_id , :o_carrier_id , :ol_delivery_d ) ) AS T

Again the plan below shows how dense this transaction has become thanks to the efficient usage of SQL:


Listing 12. Delivery access plan
access plan

Again, take a quick tour through this plan's key features:

  • DELETE(6) is the POP from the queue. DB2 recognized that all it needs is an index fetch. Of course the SQL would have worked with a SORT as well, but that would be a bad plan.
  • GRPBY(11) computes the SUM(OL_AMOUNT) right on top of the UPDATE(12).

By now it is assumed that all the other operators are well known to the attentive reader.


PAYMENT transaction

There are two variants of the PAYMENT transaction. The first variant is used for customers who provide their customer id. The second variant is used for customers who don't remember their customer ID and who provide their last name instead. Only the second variant is described here, since it provides a challenge where the first does not.

In the payment transaction (by last name) the following steps must occur:

  1. Retrieve the name and address of the district.
  2. Find the customer id of the customer by last name. If there is more than one customer with the same last name in this district, the right customer is the "middle" customer with respect to the customer's first name.
  3. Retrieve the customer's personal information.
  4. Increment the revenue to date for the district.
  5. Increment the revenue to date for the warehouse.
  6. Increment the payments made by the customer including some extra data if the customer has bad credit.
  7. Log the payment in the history.

As in the previous transactions, most of the logic is encapsulated into a table function named PAY_C_LAST().


Listing 13. Table function PAY_C_LAST
			
  1 CREATE FUNCTION PAY_C_LAST(  W_ID INTEGER
  2                            , D_ID SMALLINT
  3                            , C_W_ID INTEGER
  4                            , C_D_ID SMALLINT
  5                            , C_LAST VARCHAR(16)
  6                            , H_DATE BIGINT
  7                            , H_AMOUNT BIGINT  
  8                            , BAD_CREDIT_PREFIX VARCHAR(34) 
  9                           )
 10 RETURNS TABLE(   W_STREET_1 CHAR(20)
 11                , W_STREET_2 CHAR(20)
 12                , W_CITY CHAR(20)
 13                , W_STATE CHAR(2)
 14                , W_ZIP CHAR(9)
 15                , D_STREET_1 CHAR(20)
 16                , D_STREET_2 CHAR(20)
 17                , D_CITY CHAR(20)
 11                , D_STATE CHAR(2)
 19                , D_ZIP CHAR(9)
 20                , C_ID INTEGER
 21                , C_FIRST VARCHAR(16)
 22                , C_MIDDLE CHAR(2)
 23                , C_STREET_1 VARCHAR(20)
 24                , C_STREET_2 VARCHAR(20)
 25                , C_CITY VARCHAR(20)
 26                , C_STATE CHAR(2)
 27                , C_ZIP CHAR(9)
 28                , C_PHONE CHAR(16)
 29                , C_SINCE BIGINT
 30                , C_CREDIT CHAR(2)
 31                , C_CREDIT_LIM BIGINT
 32                , C_DISCOUNT INTEGER
 33                , C_BALANCE BIGINT
 34                , C_DATA CHAR(200)   
 35              )
 36 SPECIFIC PAY_C_ID INHERIT ISOLATION LEVEL WITH LOCK REQUEST
 37 MODIFIES SQL DATA DETERMINISTIC NO EXTERNAL ACTION LANGUAGE SQL
 38 VAR: BEGIN ATOMIC
 39   DECLARE  W_NAME CHAR(10) ;
 40   DECLARE  D_NAME CHAR(10) ;
 41   DECLARE  W_STREET_1 CHAR(20) ;
 42   DECLARE  W_STREET_2 CHAR(20) ;
 43   DECLARE  W_CITY CHAR(20) ;
 44   DECLARE  W_STATE CHAR(2) ;
 45   DECLARE  W_ZIP CHAR(9) ;
 46   DECLARE  D_STREET_1 CHAR(20) ;
 47   DECLARE  D_STREET_2 CHAR(20) ;
 48   DECLARE  D_CITY CHAR(20) ;
 49   DECLARE  D_STATE CHAR(2) ;
 50   DECLARE  D_ZIP CHAR(9) ;
 51   DECLARE  C_ID INTEGER ;
 52   DECLARE  C_FIRST VARCHAR(16) ;
 53   DECLARE  C_MIDDLE CHAR(2) ;
 54   DECLARE  C_STREET_1 VARCHAR(20) ;
 55   DECLARE  C_STREET_2 VARCHAR(20) ;
 56   DECLARE  C_CITY VARCHAR(20) ;
 57   DECLARE  C_STATE CHAR(2) ;
 58   DECLARE  C_ZIP CHAR(9) ;
 59   DECLARE  C_PHONE CHAR(16) ;
 60   DECLARE  C_SINCE BIGINT ;
 61   DECLARE  C_CREDIT CHAR(2) ;
 62   DECLARE  C_CREDIT_LIM BIGINT ;
 63   DECLARE  C_DISCOUNT INTEGER ;
 64   DECLARE  C_BALANCE BIGINT ;
 65   DECLARE  C_DATA CHAR(200) ;
 66
 67   /* Update District and retrieve its data */
 68   SET ( D_NAME, D_STREET_1, D_STREET_2, D_CITY, D_STATE, D_ZIP )
 69     =  ( SELECT   D_NAME, D_STREET_1, D_STREET_2, D_CITY, D_STATE, D_ZIP
 70            FROM OLD TABLE ( UPDATE DISTRICT
 71                                SET D_YTD = D_YTD + PAY_C_ID.H_AMOUNT
 72                              WHERE D_W_ID = PAY_C_ID.W_ID
 73                                AND D_ID   = PAY_C_ID.D_ID
 74                           ) AS U
 75        )
 76   ;
 77   /* Determine the C_ID */
 78   SET ( C_ID )
 79     = ( SELECT C_ID
 80           FROM ( SELECT   C_ID 
 81                         , COUNT(*) OVER() AS COUNT
 82                         , ROWNUMBER() OVER (ORDER BY C_FIRST) AS NUM
 83                    FROM CUSTOMER
 84                    WHERE C_LAST = PAY_C_LAST.C_LAST
 85                      AND C_W_ID = PAY_C_LAST.C_W_ID
 86                      AND C_D_ID = PAY_C_LAST.C_D_ID 
 87                ) AS T
 88          WHERE NUM = (COUNT + 1) / 2
 89       )    
 90   ;
 91   /* Update the customer */
 92   SET (   C_FIRST, C_MIDDLE, C_STREET_1, C_STREET_2
 93         , C_CITY, C_STATE, C_ZIP, C_PHONE, C_SINCE, C_CREDIT, C_CREDIT_LIM
 94         , C_DISCOUNT, C_BALANCE, C_DATA )
 95     = ( SELECT   C_FIRST, C_MIDDLE, C_STREET_1, C_STREET_2
 96                , C_CITY, C_STATE, C_ZIP, C_PHONE, C_SINCE, C_CREDIT
 97                , C_CREDIT_LIM , C_DISCOUNT, C_BALANCE 
 98                , CASE WHEN C_CREDIT = 'BC' 
 99                       THEN SUBSTR(C_DATA, 1, 200) END AS C_DATA
100           FROM NEW TABLE ( UPDATE CUSTOMER
101                               SET   C_BALANCE     = C_BALANCE - PAY_C_ID.H_AMOUNT
102                                   , C_YTD_PAYMENT = C_YTD_PAYMENT + PAY_C_ID.H_AMOUNT
103                                   , C_PAYMENT_CNT = C_PAYMENT_CNT + 1
104                                   , C_DATA = CASE WHEN C_CREDIT = 'BC'
105                                                   THEN    BAD_CREDIT_PREFIX
106                                                        || SUBSTR( C_DATA, 1, 466 )
107                                                   ELSE C_DATA
108                                              END
109                             WHERE C_W_ID = PAY_C_ID.C_W_ID
110                               AND C_D_ID = PAY_C_ID.C_D_ID
111                               AND C_ID   = PAY_C_ID.C_ID
112                          ) AS U
113       )
114   ;
115   /* Update the warehouse */
116   SET ( W_NAME, W_STREET_1, W_STREET_2, W_CITY, W_STATE,  W_ZIP )
117     = ( SELECT W_NAME, W_STREET_1, W_STREET_2, W_CITY, W_STATE,  W_ZIP
118           FROM OLD TABLE ( UPDATE WAREHOUSE
119                               SET W_YTD = W_YTD + PAY_C_ID.H_AMOUNT
120                             WHERE W_ID = PAY_C_ID.W_ID
121                          ) AS U
122       )
123   ;
124   /* Finally insert into the history */
125   INSERT 
126     INTO HISTORY ( H_C_ID, H_C_D_ID, H_C_W_ID, H_D_ID
127                  , H_W_ID, H_DATA, H_DATE, H_AMOUNT )
128     VALUES (   PAY_C_ID.C_ID 
129              , PAY_C_ID.C_D_ID 
130              , PAY_C_ID.C_W_ID
131              , PAY_C_ID.D_ID
132              , PAY_C_ID.W_ID
133              , VAR.W_NAME || CHAR( '    ', 4 ) || VAR.D_NAME
134              , PAY_C_ID.H_DATE
135              , PAY_C_ID.H_AMOUNT
136            )
137   ;
138   /* Done - return the collected data */
139   RETURN VALUES (   W_STREET_1, W_STREET_2, W_CITY, W_STATE,  W_ZIP
140                   , D_STREET_1, D_STREET_2, D_CITY, D_STATE, D_ZIP
141                   , C_ID , C_FIRST, C_MIDDLE, C_STREET_1, C_STREET_2
142                   , C_CITY, C_STATE, C_ZIP, C_PHONE, C_SINCE, C_CREDIT
143                   , C_CREDIT_LIM , C_DISCOUNT, C_BALANCE, C_DATA 
144                 ) 
145   ;
146 END
			


Listing 14. SQL statement for payment transaction
			
 1 SELECT   W_STREET_1, W_STREET_2, W_CITY, W_STATE,  W_ZIP
 2        , D_STREET_1, D_STREET_2, D_CITY, D_STATE, D_ZIP
 3        , C_ID, C_FIRST, C_MIDDLE, C_STREET_1, C_STREET_2
 4        , C_CITY, C_STATE, C_ZIP, C_PHONE, C_SINCE, C_CREDIT, C_CREDIT_LIM
 5        , C_DISCOUNT, C_BALANCE, C_DATA  
 6   INTO   :w_street_1 , :w_street_2 , :w_city , :w_state , :w_zip 
 7        , :d_street_1 , :d_street_2 , :d_city , :d_state , :d_zip
 8        , :c_id , :c_first , :c_middle ,   :c_street_1 , :c_street_2 , :c_city , :c_state
 9        , :c_zip , :c_phone , :c_since , :c_credit , :c_credit_lim 
10        , :c_discount , :c_balance, :c_data :c_data_indicator
11   FROM TABLE ( PAY_C_LAST(   :w_id 
12                            , :d_id 
13                            , :c_w_id 
14                            , :c_d_id 
15                            , :c_last_input 
16                            , :h_date 
17                            , :h_amount 
18                            , :c_data_prefix_c_last
19                          )
20               ) AS PAY_C_LAST
21  WITH RR USE AND KEEP UPDATE LOCKS
			

On top of the usual optimizations, there are two new techniques to take note of:

  • To determine the correct customer, the CUSTOMER table needs to be read. Only after this is done can the update to the CUSTOMER table be performed. By default, this means that all customer rows with the last name in question will acquire a share lock. To perform the update, the lock needs to be transformed into an update lock. There is a small risk that the same customer wants to do another payment for a different order at the same time. If this happens between the fetch and the update, then a dead lock will occur, since neither transaction will be able to acquire the update lock given that the second transaction holds a share lock. To avoid situtations like this, DB2 V8.2 supports the so called lock-request-clause. In this case WITH RR USE AND KEEP UPDATE LOCKS will cause DB2 to collect at least update locks instead of share locks throughout the statement. For semantic cleanliness and future extensibility, the SQL function uses a matching clause INHERIT ISOLATION LEVEL WITH LOCK REQUEST.
  • To find the median, the use of ROW_NUMBER() was chosen. This OLAP function numbers all customers of the same last name by their first names. It was further decided to not use a separate query to get the total COUNT. Instead, again OLAP has been used. The trade-off is one between memory consumption to buffer all matching customers - since the total COUNT has to be tagged onto each customer but it is not known until the end - and doing two separate index scans from the customer table. For small numbers of rows and a small row-width, (C_ID, COUNT, NUM) using COUNT(*) OVER() is in fact somewhat better.

Listing 15 shows the plan for the payment transaction.


Listing 15. Access plan for payment transaction
Plan for payment transaction

ORDER STATUS query

The purpose of the ORDER STATUS query is to allow customers to retrieve information about their orders. As in payment, some customers remember their customer ids, while others don't. Here is the more complex version given a customer who calls in by last name only:

  1. The customer ID needs to be determined using the same algorithm as in the PAYMENT transaction.
  2. The customer's full name and account balance is retrieved.
  3. The latest order of the customer is determined by selecting the highest order id for that customer.
  4. The ID of the delivery carrier is determined if the order was delivered as well as the date when the order was submitted.
  5. The delivery date, quantity, total price and delivering warehouse are retrieved for each order line.

Listing 16. Order status function
			
 1 CREATE FUNCTION ORD_C_LAST(   W_ID INTEGER
 2                             , D_ID SMALLINT
 3                             , C_LAST VARCHAR(16)
 4                           )
 5 RETURNS TABLE(   O_ID         INTEGER
 6                , O_CARRIER_ID SMALLINT
 7                , O_ENTRY_D    BIGINT
 8                , C_BALANCE    BIGINT
 9                , C_FIRST      VARCHAR(16)
10                , C_MIDDLE     CHAR(2)
11                , C_ID         INTEGER 
12              )
13 SPECIFIC ORD_C_LAST
14 READS SQL DATA NO EXTERNAL ACTION DETERMINISTIC LANGUAGE SQL
15 VAR: BEGIN ATOMIC
16   DECLARE C_BALANCE    BIGINT ;
17   DECLARE C_FIRST      VARCHAR(16) ;
18   DECLARE C_MIDDLE     CHAR(2) ;
19   DECLARE C_ID         INTEGER ;
20   DECLARE O_ID         INTEGER;
21   DECLARE O_CARRIER_ID SMALLINT;
22   DECLARE O_ENTRY_D    BIGINT;
23
24   /* Retrieve the Customer information */
25   SET ( C_BALANCE, C_FIRST, C_MIDDLE, C_ID )
26     = ( SELECT  C_BALANCE, C_FIRST, C_MIDDLE , C_ID
27           FROM ( SELECT   C_ID 
28                         , C_BALANCE
29                         , C_FIRST
30                         , C_MIDDLE
31                         , COUNT(*) OVER() AS COUNT
32                         , ROWNUMBER() OVER (ORDER BY C_FIRST) AS NUM
33                    FROM CUSTOMER
34                   WHERE C_W_ID = ORD_C_LAST.W_ID
35                     AND C_D_ID = ORD_C_LAST.D_ID
36                     AND C_LAST = ORD_C_LAST.C_LAST
37                ) AS V1
38          WHERE NUM = (COUNT + 1) / 2
39       )
40   ;
41   /* Take advantage of the index to fetch the first row (and hence max(o_id) ) */
42   SET ( O_ID , O_CARRIER_ID , O_ENTRY_D )
43     =    ( SELECT   O_ID
44                   , O_CARRIER_ID
45                   , O_ENTRY_D
46              FROM ORDERS 
47             WHERE O_W_ID = ORD_C_LAST.W_ID 
48               AND O_D_ID = ORD_C_LAST.D_ID 
49               AND O_C_ID = VAR.C_ID  
50             ORDER BY O_ID DESC
51             FETCH FIRST 1 ROW ONLY 
52          ) 
53   ;
54   RETURN VALUES (   VAR.O_ID
55                   , VAR.O_CARRIER_ID
56                   , VAR.O_ENTRY_D
57                   , VAR.C_BALANCE
58                   , VAR.C_FIRST
59                   , VAR.C_MIDDLE
60                   , VAR.C_ID
61                 ) 
62   ;
63 END
			


Listing 17. Calling the function
			
1 SELECT O_ID, O_CARRIER_ID, O_ENTRY_D, C_BALANCE, C_FIRST, C_MIDDLE, C_ID
2   INTO :o_id, :o_carrier_id , :o_entry_d , :c_balance, :c_first, :c_middle, :c_id
3   FROM TABLE ( ORD_C_LAST(   :w_id
4                            , :d_id 
5                            , :c_last_input 
6                          )
7              ) AS ORD_C_LAST


Listing 18. SQL statement for the order status query
			
1 SELECT OL_I_ID, OL_SUPPLY_W_ID, OL_QUANTITY, OL_AMOUNT, OL_DELIVERY_D
2   FROM ORDER_LINE
3   WHERE OL_W_ID = :w_id
4     AND OL_D_ID = :d_id
5     AND OL_O_ID = :o_id
6   FOR FETCH ONLY ;

A lot of the usual techniques to improve performance were applied again. For example, all the steps not involving the order lines have been unified into one SQL table function. Also the "middle customer" is retrieved using OLAP. However, one last time there is something interesting to glean from this query:

Naively one would first determine the maximum order id for the customer and then use this ID to retrieve the carrier and the entry date. However it is denser to unify those two queries into one by exploiting the fact that the maximum order ID for a customer is also the first order ID fetched in an index on the customer and the order id, if the order id is sorted descending. Given a matching index, a single index fetch will pin down the row to be retrieved. This is the same technique used in the delivery transaction, this time without the DELETE and a MAXIMUM.

Note that the order lines are retrieved through a separate cursor. It is more efficient to execute two statements than to return a Cartesian product of the two queries which would send the customer and order information repeated for each order line.

Listing 19 lists the plan showing the efficient use (FETCH(8)) of ORDER BY and FETCH FIRST 1 ROW ONLY discussed above.


Listing 19. Order status query plan
			
                                Rows 
                               RETURN 
                               (   1) 
                                Cost 
                                 I/O 
                                 |
                                  1 
                               NLJOIN 
                               (   2) 
                               12.928 
                                2.008 
                       /----------+---------\
                     1                         1 
                  NLJOIN                    TBSCAN 
                  (   3)                    (  10) 
                  12.9279                4.48727e-005 
                   2.008                       0 
           /---------+--------\               |
         1                       1             1 
      TBSCAN                  UNION    TABFNC: SYSIBM   
      (   4)                  (   5)        GENROW 
   4.48727e-005              0.108135 
         0                   0.013056 
        |                /-------+------\
         1             1                   1 
 TABFNC: SYSIBM     FILTER              FETCH  
      GENROW        (   6)              (   8) 
                   0.0176324          0.0905021 
                       0              0.005056 
                      |               /----+---\
                      0.2       0.005056         79 
                    IXSCAN       IXSCAN    TABLE: SRIELAU  
                    (   7)       (   9)        ORDERS 
                   0.0158334     0.0251716 
                       0            0 
                      |            |
                       5           79 
                 INDEX: SRIELAU INDEX: SRIELAU  
                   CUST_IDXB      ORDR_IDXB 


STOCK LEVEL query

Last but not least, the STOCK LEVEL query exercises a three-way join to determine the number of items in the past 20 orders for a given district for which the stock levels are below a specified threshhold. There is not much to be said about this query other than the fact that it is the only query that can run with a cursor stability isolation level. DB2 has the capability of specifying the isolation level on a query-by-query basis which has been exploited here.


Listing 20. Stock level query
			
 1 SELECT COUNT( S_I_ID ) INTO :low_stock
 2   FROM ( SELECT DISTINCT S_I_ID
 3            FROM ORDER_LINE , STOCK , DISTRICT
 4            WHERE D_W_ID = :w_id
 5              AND D_ID = :d_id       
 6              AND OL_O_ID < d_next_o_id
 7              AND OL_O_ID >= ( d_next_o_id - 20 )
 8              AND OL_W_ID = D_W_ID
 9              AND OL_D_ID = D_ID
10              AND S_I_ID = OL_I_ID
11              AND S_W_ID = OL_W_ID
12              AND S_QUANTITY < :threshold 
13         ) AS OLS
14   WITH CS


Listing 21. Stock level query access plan
			
                                                   Rows 
                                                  RETURN 
                                                  (   1) 
                                                   Cost 
                                                    I/O 
                                                    |
                                                     1 
                                                  GRPBY  
                                                  (   2) 
                                                  13.204 
                                                  1.02222 
                                                    |
                                               3.75467e-005 
                                                  TBSCAN 
                                                  (   3) 
                                                  13.2039 
                                                  1.02222 
                                                    |
                                               3.75467e-005 
                                                  SORT   
                                                  (   4) 
                                                  13.2033 
                                                  1.02222 
                                                    |
                                               3.75467e-005 
                                                  NLJOIN 
                                                  (   5) 
                                                  13.2023 
                                                  1.02222 
                                /--------------------+--------------------\
                         0.00782222                                       0.0048 
                           NLJOIN                                         FETCH  
                           (   6)                                         (  11) 
                           13.0011                                       0.201169 
                           1.00782                                        0.0144 
                  /-----------+-----------\                             /----+---\
                1                       0.00782222                 0.0144           9 
             FETCH                        FETCH                    IXSCAN    TABLE: SRIELAU  
             (   7)                       (   9)                   (  12)         STOCK 
             12.872                      0.129119                 0.0157274 
                1                       0.00782222                    0 
           /----+---\                   /----+---\                   |
         1            26         0.00782222        44                 9 
      IXSCAN    TABLE: SRIELAU     IXSCAN    TABLE: SRIELAU    INDEX: SYSIBM   
      (   8)       DISTRICT        (  10)      ORDER_LINE     SQL0410231029421 
     0.0175755                    0.0282312 
         0                            0 
        |                            |
        26                           44 
  INDEX: SYSIBM                INDEX: SYSIBM   
 SQL0410231029415             SQL0410231030088 


Conclusion

In this article, Rielau has given a brief introduction into the schema of the TPC-C benchmark and its transactions. Much more is needed to execute this benchmark at the extreme levels on which DB2 has performed, but the concise usage of SQL stands at the beginning of it all. Efficient SQL yields efficient query plans, which in turn will mean that only codepath that is necessary will be executed. Only rows that are needed will be read. Rielau considers the SQL used in the TPC-C benchmark by DB2 very near optimal. There is very little fat to cut, if any.

In summary, belw are the interesting SQL features exposed by this implementation of TPC-C:

  • The usage of SQL Table Functions allows you to push procedural logic into the FROM clause of a query. Through correlation, SQL table functions allow a more efficient way of implementing iterations, otherwise done with cursors.
  • MODIFIES SQL DATA in SQL table functions allows you to push even INSERT, UPDATE, DELETE and MERGE (data-change-operations) into the inner leg of correlated joins.
  • Usage of data-change-operations in the FROM clause allows retrieval of generated columns such as IDs, as well as the retrieval of data to be deleted or updated.
  • Using queries as the target of a data change operation allows deletion or updates of rows determined by complex SQL including ORDER BY. A common exploitation of this feature is the implementation of POP semantics for queues.
  • ORDER BY in combination with FETCH FIRST can be used efficiently to select the maximum or minumum row, including retrieval of columns not part of the aggregate function itself.
  • OLAP functions are an option to be considered when selecting rows by conditions other than maximum or minimum.
  • Common table expressions (the WITH clause) allow efficient pipelining of data-change-operations .

While the TPC-C benchmark is very simple, it is nonetheless quite efficient in touching many aspects of OLTP processing.

Note that, while DB2 is not the first product to introduce some means of returning data from INSERT, UPDATE and DELETE, it is the first to integrate the concept into SQL itself, allowing immediate relational processing of the result without going through a temporary table and procedural language constructs.


Acknowledgements

I may not have needed much help to write this article, but it is based on the accomplishments of a whole organization.

I want to thank first and foremost Bobbie Cochrane, Hamid Pirahesh and Richard Sidle from the IBM Almaden Research Center. Without their work in Starburst, inline SQL PL and data-change-operations in the FROM clause could not exist.

I want to thank my team for delivering nested data-change-operations - one of the most exciting SQL extensions of the last years.

Finally I want to thank everyone involved in making the TPC-C results of the past 18 months a reality.

There is only one place to go when IBM wants to deliver the fastest result on their fastest machine: Across the hall!


Resources

  • TPC.org homepage. Everything you want to know about the Transaction Processing Council (TPC): Benchmark rules, latest results, members, and so on.

  • TPC-C Specification (PDF). The TPC-C specification is the official definition on how a TPC-C benchmark is to be perfromed. My explanation here is truly only a summary.

  • Lots of Starburst resources. Starburst was the follow-on research project to System R. Its purpose was to develop a highly extensible RDBMS. The Starburst SQL compiler has been used in DB2 UDB for Linux, UNIX and Windows since DB2 V2 for Common Servers.

  • "Returning Modified Rows" (VLDB 2004 Toronto). DB2 development's introduction of "data-change-operations in the FROM clause" to the research community.

  • Patent on inline SQL PL design. This patent describes the details on how inline SQL PL has been implemented into Starburst.

  • Patent on SQL Function design. Here the concept of how SQL Functions can get efficiently expanded into a query graph are explained.

  • "Profiling SQL Procedures". This article introduces a GUI tool to tune SQL Procedures. It is complementary to this talk and for those reader in interested in SQL PL beyond SQL functions.

  • "Tracing SQL Procedures". Again an article which deals with SQL Procedures and how to debug them.

  • DB2 UDB for LUW Information Center. The Information Center is the online documentation for DB2.

About the author

Serge Rielau

Serge Rielau is a Senior Software Developer, working on DB2 UDB SQL compiler development at the IBM Toronto lab. He has worked in DB2’s SQL compiler development team for 7 years and is an expert in the SQL Language. Among other projects he has developed inline SQL PL and SQL Functions. He also designed what is commonly known as "SELECT FROM INSERT, UPDATE and DELETE". Nowadays Serge spends a good deal of his time helping customers migrate from other DBMS to DB2 and he is an active participant in the comp.databases.ibm-db2 newsgroup.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=31895
ArticleTitle=High-performance SQL
publish-date=11182004
author1-email=srielau@ca.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers