The past year and a half have seen a lot of DB2 for LUW activity on the standard TPC-C benchmark, the benchmark commonly used to measure performance in the on line transaction processing environment. The results range from the very small numbers on very reasonable machines to very, big numbers on fantastic setups that you are unlikely to see at home or at work.
While the number of rows populating the database schema may have been adjusted to scale, one area remains the same: SQL. In this article, you walk through the logic of the TPC-C benchmark, gain an understanding of the SQL technology, and discover how it can be used in real customer environments.
To accomplish this, author Serge Rielau, has organized the article as follows: First by introducing the benchmarkâs DB schema and its transactions. Next, he will analyze each transaction and explain the SQL features of each. Overall, at the end of this article, you should have a better understanding of the TPC-C benchmark as well as DB2âs advanced SQL features, including the philosophy behind them.
TPC-C: A quick overview of the benchmark
The TPC-C benchmark simulates a distribution enterprise with a product sales cycle using a small set of transactions and tables. The external schema and the semantics of each transaction are formally specified. For the purpose of getting a quick overview and to provide a base schema to work on, the following is a quick summary. Tables are marked in bold and transactions are shown in italics:
The database used for the benchmark consists primarily of warehouses, districts, items and customers. 100,000 ITEMs are in STOCK at each WAREHOUSE. Each warehouse serves 10 DISTRICTs. Each district serves 3000 CUSTOMERs which ORDER new items. Each ORDER can be composed of up to 15 distinct items called ORDER_LINE. Before an order is delivered, it is queued as a NEW_ORDER, and the stock for each item is adjusted for the warehouse.. Upon DELIVERY, the customer is charged. Upon PAYMENT, the transaction is archived in the HISTORY, and the revenue is booked.
In addition, customers can inquire on their ORDER STATUS, and warehouse managers can inquire the STOCK LEVEL of items on order.
To make things a bit more realistic, customers tend to forget their customer ids, so you need to be able to look them up by their last names, of which there may be duplicates. Also, not all orders can be satisfied from the local warehouse, requiring items to be shipped in from remote warehouses.
The number of new orders that can be handled by a warehouse is constrained to between 9 and 12.86 transactions per minute. This means that to achieve a result of over 3.2 Million TpmC (NEW ORDER Transactions per minute TPC-C), the number of warehouses needs to be dialed up to more than 256,000. This implies 7.7 Billion registered customers. So, donât try this unless you have plenty of disks and a big fuse!
Before you dive into the five transactions, lhere is the DDL for the tables. It is pretty self-explanatory.
Listing 1. DDL to create the TPC-C database tables
1 CREATE TABLE WAREHOUSE 2 ( 3 W_NAME CHAR(10) NOT NULL, 4 W_STREET_1 CHAR(20) NOT NULL, 5 W_STREET_2 CHAR(20) NOT NULL, 6 W_CITY CHAR(20) NOT NULL, 7 W_STATE CHAR(2) NOT NULL, 8 W_ZIP CHAR(9) NOT NULL, 9 W_TAX INTEGER NOT NULL, 10 W_YTD BIGINT NOT NULL, 11 W_ID INTEGER NOT NULL, 12 PRIMARY KEY (W_ID) 13 ); 14 15 CREATE TABLE DISTRICT 16 ( 17 D_NEXT_O_ID INTEGER NOT NULL, 18 D_TAX INTEGER NOT NULL, 19 D_YTD BIGINT NOT NULL, 20 D_NAME CHAR(10) NOT NULL, 21 D_STREET_1 CHAR(20) NOT NULL, 22 D_STREET_2 CHAR(20) NOT NULL, 23 D_CITY CHAR(20) NOT NULL, 24 D_STATE CHAR(2) NOT NULL, 25 D_ZIP CHAR(9) NOT NULL, 26 D_ID SMALLINT NOT NULL, 27 D_W_ID INTEGER NOT NULL, 28 PRIMARY KEY (D_ID, D_W_ID) 29 ); 30 31 CREATE TABLE ITEM 32 ( 33 I_NAME CHAR(24) NOT NULL, 34 I_PRICE INTEGER NOT NULL, 35 I_DATA VARCHAR(50) NOT NULL, 36 I_IM_ID INTEGER NOT NULL, 37 I_ID INTEGER NOT NULL, 38 PRIMARY KEY (I_ID) 39 ); 40 41 CREATE TABLE STOCK 42 ( 43 S_REMOTE_CNT INTEGER NOT NULL, 44 S_QUANTITY INTEGER NOT NULL, 45 S_ORDER_CNT INTEGER NOT NULL, 46 S_YTD INTEGER NOT NULL, 47 S_DATA VARCHAR(50) NOT NULL, 48 S_DIST_01 CHAR(24) NOT NULL, 49 S_DIST_02 CHAR(24) NOT NULL, 50 S_DIST_03 CHAR(24) NOT NULL, 51 S_DIST_04 CHAR(24) NOT NULL, 52 S_DIST_05 CHAR(24) NOT NULL, 53 S_DIST_06 CHAR(24) NOT NULL, 54 S_DIST_07 CHAR(24) NOT NULL, 55 S_DIST_08 CHAR(24) NOT NULL, 56 S_DIST_09 CHAR(24) NOT NULL, 57 S_DIST_10 CHAR(24) NOT NULL, 58 S_I_ID INTEGER NOT NULL, 59 S_W_ID INTEGER NOT NULL, 60 PRIMARY KEY (S_I_ID, S_W_ID) 61 ); 62 63 CREATE TABLE CUSTOMER 64 ( 65 C_ID INTEGER NOT NULL, 66 C_STATE CHAR(2) NOT NULL, 67 C_ZIP CHAR(9) NOT NULL, 68 C_PHONE CHAR(16) NOT NULL, 69 C_SINCE BIGINT NOT NULL, 70 C_CREDIT_LIM BIGINT NOT NULL, 71 C_MIDDLE CHAR(2) NOT NULL, 72 C_CREDIT CHAR(2) NOT NULL, 73 C_DISCOUNT INTEGER NOT NULL, 74 C_DATA VARCHAR(500) NOT NULL, 75 C_LAST VARCHAR(16) NOT NULL, 76 C_FIRST VARCHAR(16) NOT NULL, 77 C_STREET_1 VARCHAR(20) NOT NULL, 78 C_STREET_2 VARCHAR(20) NOT NULL, 79 C_CITY VARCHAR(20) NOT NULL, 80 C_D_ID SMALLINT NOT NULL, 81 C_W_ID INTEGER NOT NULL, 82 C_DELIVERY_CNT INTEGER NOT NULL, 83 C_BALANCE BIGINT NOT NULL, 84 C_YTD_PAYMENT BIGINT NOT NULL, 85 C_PAYMENT_CNT INTEGER NOT NULL, 86 PRIMARY KEY (C_ID, C_D_ID, C_W_ID) 87 ); 88 CREATE INDEX CUST_IDXB 89 ON CUSTOMER (C_LAST, C_W_ID, C_D_ID, C_FIRST, C_ID); 90 91 CREATE TABLE HISTORY 92 ( 93 H_C_ID INTEGER NOT NULL, 94 H_C_D_ID SMALLINT NOT NULL, 95 H_C_W_ID INTEGER NOT NULL, 96 H_D_ID SMALLINT NOT NULL, 97 H_W_ID INTEGER NOT NULL, 98 H_DATE BIGINT NOT NULL, 99 H_AMOUNT INTEGER NOT NULL, 100 H_DATA CHAR(24) NOT NULL 101 ); 102 103 CREATE TABLE ORDERS 104 ( 105 O_C_ID INTEGER NOT NULL, 106 O_ENTRY_D BIGINT NOT NULL, 107 O_CARRIER_ID SMALLINT NOT NULL, 108 O_OL_CNT SMALLINT NOT NULL, 109 O_ALL_LOCAL SMALLINT NOT NULL, 110 O_ID INTEGER NOT NULL, 111 O_W_ID INTEGER NOT NULL, 112 O_D_ID SMALLINT NOT NULL, 113 PRIMARY KEY (O_ID, O_W_ID, O_D_ID) 114 ); 115 CREATE INDEX ORDR_IDXB 116 ON ORDERS (O_C_ID, O_W_ID, O_D_ID, O_ID DESC); 117 118 CREATE TABLE ORDER_LINE 119 ( 120 OL_DELIVERY_D BIGINT NOT NULL, 121 OL_AMOUNT INTEGER NOT NULL, 122 OL_I_ID INTEGER NOT NULL, 123 OL_SUPPLY_W_ID INTEGER NOT NULL, 124 OL_QUANTITY SMALLINT NOT NULL, 125 OL_DIST_INFO CHAR(24) NOT NULL, 126 OL_O_ID INTEGER NOT NULL, 127 OL_D_ID SMALLINT NOT NULL, 128 OL_W_ID INTEGER NOT NULL, 129 OL_NUMBER SMALLINT NOT NULL, 130 PRIMARY KEY (OL_O_ID, OL_W_ID, OL_D_ID, OL_NUMBER) 131 ); 132 133 CREATE TABLE NEW_ORDER 134 ( 135 NO_O_ID INTEGER NOT NULL, 136 NO_D_ID SMALLINT NOT NULL, 137 NO_W_ID INTEGER NOT NULL, 138 PRIMARY KEY (NO_W_ID, NO_D_ID, NO_O_ID) 139 ); |
The NEW ORDER transaction processes a new order from a customer. Various things need to be done here:
- Retrieve the next order id and the district tax rate for the customers district.
- Increment the next order id for the customers district for subsequent transactions.
- For each item the customer orders:
- Retrieve the item's name, price and description from the ITEM table.
- Retrieve some district information and the remaining stock level for the item from STOCK.
- Decrease the stock level of the item by the ordered quantity. If the stock falls below a threshold, order in supplies (by simply incrementing the value).
- Insert the relevant retrieved data including the overall price into ORDER_LINE.
- Insert the order into the ORDERS and NEW_ORDER tables.
- Retrieve the customer name, discount, credit information from CUSTOMER.
- Retrieve sales tax information from WAREHOUSE.
- Compute the total price considering the discount and the taxes.
Thatâs quite a bit of work. However DB2 manages to get it all done using 3 SQL statements. This is how:
First, DB2 deals with the DISTRICT table. Data needs to be returned and an update needs to be performed. Conventional wisdom states that this requires 2 SQL statements, and that the UPDATE ought to be done prior to the SELECT; otherwise deadlocks may occur as concurrency increases.
DB2 however supports a new SQL feature which is in the process of being standardized. This feature allows access to what is known as transition tables in triggers. The OLD TABLE transition table holds the original state of the affected rows before they are processed by the UPDATE or DELETE statement. The NEW TABLE transition table holds the affected rows immediately after an INSERT or UPDATE was processed. That is the state prior to when AFTER triggers fire. Users with a Microsoft or Sybase background may know these tables by the names DELETED and INSERTED.
All that DB2 does is to allow an UPDATE, DELETE, and INSERT to appear in the FROM clause of a select and allow the user to chose which transition table shall be selected from:
Listing 2. Using transition tables
1 SELECT D_TAX, D_NEXT_O_ID 2 INTO :dist_tax , :next_o_id 3 FROM OLD TABLE ( UPDATE DISTRICT 4 SET D_NEXT_O_ID = D_NEXT_O_ID + 1 5 WHERE D_W_ID = :w_id 6 AND D_ID = :d_id 7 ) AS OT |
The advantage of this logic can be easily seen by taking a look the optimizer plan, shown in Listing 3:
Listing 3. Access plan using transition tables
Rows
RETURN
( 1)
Cost
I/O
|
1
UPDATE
( 2)
25.7261
2
/---+--\
1 26
FETCH TABLE: SRIELAU
( 3) DISTRICT
12.872
1
/----+---\
1 26
IXSCAN TABLE: SRIELAU
( 4) DISTRICT
0.0175755
0
|
26
INDEX: SYSIBM
SQL0410231029415
|
The structure of the combined plan is nearly identical to that of the UPDATE statement alone. The TPC-C specification mandates the storage of the next order id in the DISTRICT table. In a customer environment one could easily use a SEQUENCE instead to avoid locking altogether.
Now take a look at the second of the three SQL Statements:
Listing 4. Second SQL statement
1 WITH DATA AS ( SELECT O_ID , D_ID , W_ID , OL_NUMBER , I_ID 2 , W_ID AS I_SUPPLY_W_ID 3 , 0 AS OL_DELIVERY_D 4 , I_QTY 5 , ( I_PRICE * I_QTY ) AS TOTAL_PRICE 6 , OL_DIST_INFO , I_PRICE, I_NAME, I_DATA, S_DATA 7 , S_QUANTITY 8 FROM ( SELECT :next_o_id as O_ID 9 , :w_id AS W_ID 10 , :d_id as D_ID 11 , OL_NUMBER , I_ID , I_QTY 12 FROM TABLE( VALUES 13 ( 1 , :id0 , :ol_quantity0 ) 14 , ( 2 , :id1 , :ol_quantity1 ) 15 , ( 3 , :id2 , :ol_quantity2 ) 16 , ( 4 , :id3 , :ol_quantity3 ) 17 , ( 5 , :id4 , :ol_quantity4 ) 18 , ( 6 , :id5 , :ol_quantity5 ) 19 , ( 7 , :id6 , :ol_quantity6 ) 20 , ( 8 , :id7 , :ol_quantity7 ) 21 , ( 9 , :id8 , :ol_quantity8 ) 22 , ( 10 , :id9 , :ol_quantity9 ) 23 , ( 11 , :id10 , :ol_quantity10 ) 24 , ( 12 , :id11 , :ol_quantity11 ) 25 , ( 13 , :id12 , :ol_quantity12 ) 26 , ( 14 , :id13 , :ol_quantity13 ) 27 , ( 15 , :id14 , :ol_quantity14 ) 28 ) AS X ( OL_NUMBER , I_ID , I_QTY ) 29 ) AS ITEMLIST 30 , TABLE( NEW_OL_LOCAL( I_ID , I_QTY , W_ID 31 , O_ID , D_ID , SMALLINT(OL_NUMBER) 32 ) 33 ) AS NEW_OL_LOCAL 34 WHERE NEW_OL_LOCAL.I_PRICE IS NOT NULL 33 ) 36 SELECT I_PRICE , I_NAME , I_DATA , OL_DIST_INFO , S_DATA , S_QUANTITY 37 FROM NEW TABLE ( INSERT INTO ORDER_LINE 38 ( OL_O_ID , OL_D_ID , OL_W_ID 39 , OL_NUMBER , OL_I_ID , OL_SUPPLY_W_ID 40 , OL_DELIVERY_D , OL_QUANTITY , OL_AMOUNT 41 , OL_DIST_INFO 42 ) 43 INCLUDE ( I_PRICE INTEGER 44 , I_NAME CHAR(24) 45 , I_DATA VARCHAR(50) 46 , S_DATA VARCHAR(50) 47 , S_QUANTITY SMALLINT ) 48 SELECT O_ID , D_ID , W_ID 49 , OL_NUMBER , I_ID , I_SUPPLY_W_ID 60 , OL_DELIVERY_D , I_QTY , TOTAL_PRICE 61 , OL_DIST_INFO , I_PRICE , I_NAME 62 , I_DATA , S_DATA , S_QUANTITY 63 FROM DATA 64 ) AS INS |
Now, that is a hefty SQL statement. Next analyze the query step by step:
- The ITEMLIST is composed from the input arguments. It consists of items and quantities for each item colected in a VALUES clause. The order id, district and warehouse are constant for all items in the list.
- Using a correlated join, the NEW_OL_LOCAL SQL Table Function processes each item in the list separately. As you will see shortly, the function returns the item price, name, some meta data, and stock information.
- Items that could not be found (their price is NULL) can be filtered out. Call the resulting table DATA.
- Insert each item from DATA into ORDER_LINE.
- Return the item price, name, meta data and stock information for each item to the user.
There are a couple of interesting things happening here.
First, the NEW_OL _LOCAL table function really returns only one row at a time. With the correlation it operates more like a user-defined relational operator.
Second, the NEW_OL _LOCAL table function, as you will see later, actually writes to a table. What you see here is a join in which the inner performs database modifications. To allow this, and keep the database in a consistent state, DB2 needs to either convince itself that the outer leg of the join and its inner cannot collide; or DB2 needs to throw in a temporary table, called a dam, to ensure the outer leg of the join is completely derived before the table function can start processing. Since the SQL Table Function is written in inline SQL PL, DB2's macro language for SQL PL, DB2 can see through the function and conclude that the transaction is well behaved, and no dam is necessary in this case.
Having mentioned possible collisions between the outer and the inner legs of a join, it should also be mentioned that a similar problem could occur between the function and the INSERT operation. To avoid too much complexity, DB2 imposes a simple rule: An UPDATE, DELETE or INSERT (data-change-operation) shall never happen in a join unless embedded in a correlated function. Further a data-change-operation shall never occur in a nested query.
Instead data-change-operations shall be confined to the top level SELECT or the topmost SELECT of a common table expression (CTE, also known as the "WITH clause"). This is what is happening in this case: DATA performs a data change operation and has been banned as a CTE into the WITH clause. It is now allowed to feed the INSERT operation.
What is the reason for these rules? Keeping data change operations in the WITH clause imposes a very natural order which can be used to solve conflicts.
Finally, there is one more novel feature related to data-change-operations exposed in this transaction: the INCLUDE-clause which INSERT, like any other data-change-operation, can use. This clause allows you to extend the OLD TABLE or NEW TABLE transition tables by appending extra columns. These columns are simply hitching a ride along the INSERT operation and are accessible to the SELECT above. In this case the item price, name, data and the stock-related data are not part of the ORDER_LINE table. These columns are simply passed through to be returned to the user.
Why all this commotion? Why not select twice from DATA - once for the INSERT, once for the final SELECT? The answer is simple: Just like SORT, TEMP is a "four-letter-word".
Your tour through the NEW ORDER transaction would not be complete without a quick glance of the NEW_OL_LOCAL table function:
Listing 5. NEW_OL_LOCAL table function
1 CREATE FUNCTION NEW_OL_LOCAL( I_ID INTEGER 2 , I_QTY SMALLINT 3 , W_ID INTEGER 4 , O_ID INTEGER 5 , D_ID SMALLINT 6 , OL_NUMBER SMALLINT 7 ) 8 RETURNS TABLE( I_PRICE INTEGER 9 , I_NAME CHAR(24) 0 , I_DATA VARCHAR(50) 11 , OL_DIST_INFO CHAR(24) 12 , S_DATA VARCHAR(50) 13 , S_QUANTITY SMALLINT 14 ) 15 SPECIFIC NEW_OL_LOCAL 16 MODIFIES SQL DATA DETERMINISTIC NO EXTERNAL ACTION LANGUAGE SQL 17 18 VAR: BEGIN ATOMIC 19 DECLARE I_PRICE INTEGER ; 20 DECLARE I_NAME CHAR(24) ; 21 DECLARE I_DATA VARCHAR(50) ; 22 DECLARE OL_DIST_INFO CHAR(24) ; 23 DECLARE S_DATA VARCHAR(50) ; 24 DECLARE S_QUANTITY SMALLINT ; 25 26 SET ( I_PRICE , I_NAME , I_DATA ) 27 = ( SELECT 28 I_PRICE 29 , I_NAME 30 , I_DATA 31 32 FROM ITEM 33 WHERE ITEM.I_ID = NEW_OL_LOCAL.I_ID 34 ) 35 ; 36 SET ( OL_DIST_INFO , S_DATA , S_QUANTITY ) 37 = ( SELECT OL_DIST_INFO 38 , S_DATA 39 , S_QUANTITY 40 FROM NEW TABLE ( UPDATE STOCK 41 INCLUDE ( OL_DIST_INFO CHAR( 24 ) ) 42 SET S_QUANTITY = CASE WHEN S_QUANTITY - NEW_OL_LOCAL.I_QTY >= 10 43 THEN S_QUANTITY - NEW_OL_LOCAL.I_QTY 44 ELSE S_QUANTITY - NEW_OL_LOCAL.I_QTY + 91 45 END 46 , S_ORDER_CNT = S_ORDER_CNT + 1 47 , S_YTD = S_YTD + NEW_OL_LOCAL.I_QTY 48 , OL_DIST_INFO = CASE D_ID WHEN 1 THEN S_DIST_01 49 WHEN 2 THEN S_DIST_02 50 WHEN 3 THEN S_DIST_03 51 WHEN 4 THEN S_DIST_04 52 WHEN 5 THEN S_DIST_05 53 WHEN 6 THEN S_DIST_06 54 WHEN 7 THEN S_DIST_07 55 WHEN 8 THEN S_DIST_08 56 WHEN 9 THEN S_DIST_09 57 WHEN 10 THEN S_DIST_10 58 END 59 WHERE S_I_ID = NEW_OL_LOCAL.I_ID 60 AND S_W_ID = NEW_OL_LOCAL.W_ID 61 ) AS U 62 ) 63 ; 64 RETURN VALUES( VAR.I_PRICE 65 , VAR.I_NAME 66 , VAR.I_DATA 67 , VAR.OL_DIST_INFO 68 , VAR.S_DATA 69 , VAR.S_QUANTITY 70 ) 71 ; 72 END |
This function implements step 2 of the steps listed above. The function retrieves the item information for an order line and performs the necessary stock updates. Note that the function is defined as MODIFIES SQL DATA. This clause allows an SQL Table function to contain UPDATE, DELETE, INSERT and MERGE statements. Also note that, again, the INCLUDE clause has been used, this time to pass the OL_DIST_INFO column up. This column is derived from the specific district information of the stocked item. Finally, note how the single row VALUES clause is used in the RETURN statement to return the end result as a single-row table
In order to realize performance gains from this complicated SQL statement DB2 selects a very sophisticated query plan:
Listing 6. Access plan for second SQL statement
Rows
RETURN
( 1)
Cost
I/O
|
14.4
TBSCAN
( 2)
26.2997
2.02765
|
14.4
SORT
( 3)
26.2978
2.02765
|
14.4
INSERT
( 4)
26.2922
2.02765
/---+---\
14.4 44
FILTER TABLE: SRIELAU
( 5) ORDER_LINE
13.4359
1.02765
|
15
NLJOIN
( 6)
13.4334
1.02765
/----------+---------\
15 1
TBSCAN NLJOIN
( 7) ( 8)
0.000201927 13.2461
0 1.02765
| /----------+----------\
15 1 1
TABFNC: SYSIBM NLJOIN TBSCAN
GENROW ( 9) ( 18)
13.246 4.48727e-005
1.02765 0
/----------+---------\ |
1 1 1
TBSCAN UNION TABFNC: SYSIBM
( 10) ( 11) GENROW
4.48727e-005 13.2457
0 1.02765
| /----------+----------\
1 0.96 1
TABFNC: SYSIBM FETCH UPDATE
GENROW ( 13) ( 15)
12.8727 0.371624
1 0.027648
/----+---\ /---+---\
1 36 0.013824 9
IXSCAN TABLE: SRIELAU FETCH TABLE: SRIELAU
( 14) ITEM ( 16) STOCK
0.0182935 0.193765
0 0.013824
| /----+---\
36 0.013824 9
INDEX: SYSIBM IXSCAN TABLE: SRIELAU
SQL0410231029418 ( 17) STOCK
0.0157303
0
|
9
INDEX: SYSIBM
SQL0410231029421
|
Nice plan - but what does it have to do with the query above? Do a who-is-who in this plan:
- FETCH(13) represents the first SET statement in the function, selecting from the ITEM table.
- UPDATE(15) obviously is the UPDATE of the STOCK table incuding the SET statement driving it.
- UNION(11) is not entirely what it claims to be. In the context of inline SQL PL UNIONs are used to drive statements in a sequential fashion. So this UNION is best equated with the function body BEGIN ATOMIC...END.
- TBSCAN(10) initializes the local variables in the function. This is the chain of DECLARE statements. FETCH(13) and UPDATE(15) actually read and assign values to the outer of NLJOIN(9) overriding the defaults for the local variables
- TBSCAN(18) represents the RETURN statement in the SQL Table function.
- Looking at NLJOIN(8), which represents the entire function you see a three way ordered join:
- Initialize local variables
- Execute the body of the function
- Return the result table
- TBSCAN(7) as the outer of the join provides the list of items to the function.
- FILTER(5) throws out the bad items with I_PRICE IS NULL
- INSERT(4) is obviously the INSERT into ORDER_LINE, you should be done, but....
- SORT(3) is special. Since this query is a cursor, DB2 is facing a dilemma. The query changes the state of the database by updating both STOCK and ORDER_LINE. However there is no guarantee that the user will actually fetch all the rows of the table and hence drive these changes through. So DB2 needs to dam the result set to ensure that the whole query is finished when the cursor is opened. For performance reasons, a SORT without sort keys is chosen for this dam over a regular temp table. These do-at-open semantics actually have a very nice side-effect. Using a cursor with hold, one can perform complex database operations and commit them right after the open. Then the result set can be read and analyzed without holding any locks.
For those truly interested in the internal workings of inline SQL PL, more details can be found in the patents referenced in the resources section of this article.
The third and final statement in the NEW_ORDER transaction is:
Listing 7. Third SQL statement
1 SELECT W_TAX, C_DISCOUNT, C_LAST, C_CREDIT 2 INTO :ware_tax, :c_discount, :c_last, :c_credit 3 FROM TABLE ( NEW_WH ( :next_o_id 4 , :w_id 5 , :d_id 6 , :c_id 7 , :o_entry_d 8 , :inputItemCount 9 , :allLocal 10 ) 11 ) AS NEW_WH_TABLE |
Listing 8. NEW_WH function
1 CREATE FUNCTION NEW_WH ( O_ID INTEGER 2 , W_ID INTEGER 3 , D_ID SMALLINT 4 , C_ID INTEGER 5 , O_ENTRY_D BIGINT 6 , O_OL_CNT SMALLINT 7 , O_ALL_LOCAL SMALLINT 8 ) 9 RETURNS TABLE ( W_TAX INTEGER 10 , C_DISCOUNT INTEGER 11 , C_LAST VARCHAR(16) 12 , C_CREDIT CHAR(2) 13 ) 14 SPECIFIC NEW_WH 15 MODIFIES SQL DATA DETERMINISTIC NO EXTERNAL ACTION LANGUAGE SQL 16 VAR: BEGIN ATOMIC 17 DECLARE C_DISCOUNT INTEGER ; 18 DECLARE C_LAST VARCHAR(16) ; 19 DECLARE C_CREDIT CHAR(2) ; 20 DECLARE W_TAX INTEGER ; 21 22 INSERT 23 INTO NEW_ORDER ( NO_O_ID, NO_D_ID, NO_W_ID ) 24 VALUES ( O_ID 25 , D_ID 26 , W_ID 27 ) 28 ; 29 INSERT 30 INTO ORDERS ( O_C_ID , O_ENTRY_D , O_CARRIER_ID , O_OL_CNT 31 , O_ALL_LOCAL , O_ID , O_W_ID , O_D_ID ) 32 VALUES ( C_ID , O_ENTRY_D , 0 , O_OL_CNT 33 , O_ALL_LOCAL , O_ID , W_ID , D_ID ) 34 ; 35 SET ( C_DISCOUNT, C_LAST, C_CREDIT ) 36 = ( SELECT C_DISCOUNT, C_LAST, C_CREDIT 37 FROM CUSTOMER 38 WHERE C_ID = NEW_WH.C_ID 39 AND C_W_ID = W_ID 40 AND C_D_ID = D_ID 41 ) 42 ; 43 SET W_TAX 44 = ( SELECT W_TAX 45 FROM WAREHOUSE 46 WHERE W_ID = NEW_WH.W_ID 47 ) 48 ; 49 RETURN VALUES ( W_TAX , C_DISCOUNT , C_LAST , C_CREDIT ) ; 50 END |
As you can see the statements inside the NEW_WH function are independent of each other. So why bother encapsulating them into a function? The reason to merge these four statements together is to minimize the codepath associated with the invocation of an SQL statement. Within a regular customer environment the ability to have encapsulation without harming performance is goodness. DB2 proves here that good coding style and benchmarking don't have to be at odds with each other.
The corresponding plan is as follows:
Listing 9. Access plan for third SQL statement
You can see here how the UNION is driving the four statements. Also note that there is no SORT at the top because the statement is a SELECT INTO rather than a cursor.
The version of the NEW_ORDER transaction which deals with remote warehouses will not be covered. It has a very similar structure, so there is nothing new to learn.
DELIVERY is a straightforward transaction:
- The carrier finds the oldest order id on the NEW_ORDER queue.
- The order is deleted from the queue.
- The customer id is retrieved from the order stored in ORDER.
- The amount owing is computed from the ORDER_LINEs. Note that in truly normalized fashion the grand total was never stored in the order itself.
- Each order line is marked as delivered by setting the delivery day.
- The order is marked as delivered by setting the delivery day.
- The customer's balance and the delivery count are updated.
- The order id is returned.
If the queue is empty, nothing happens and NULL is returned.
Listing 10. Delivery table function
1 CREATE FUNCTION DEL( W_ID INTEGER 2 , D_ID SMALLINT 3 , CARRIER_ID SMALLINT 4 , DELIVERY_D BIGINT 5 ) 6 RETURNS TABLE ( O_ID INTEGER ) 7 SPECIFIC DELIVERY 8 MODIFIES SQL DATA DETERMINISTIC NO EXTERNAL ACTION LANGUAGE SQL 9 VAR: BEGIN ATOMIC 10 DECLARE O_ID INTEGER ; 11 DECLARE C_ID INTEGER ; 12 DECLARE AMOUNT INTEGER ; 13 14 /* Delete the order from new order table */ 15 SET VAR.O_ID = ( SELECT NO_O_ID 16 FROM OLD TABLE ( DELETE 17 FROM ( SELECT NO_O_ID 18 FROM NEW_ORDER 19 WHERE NO_W_ID = DEL.W_ID 20 AND NO_D_ID = DEL.D_ID 21 ORDER BY NO_O_ID ASC 22 FETCH FIRST 1 ROW ONLY 23 ) AS NEW_ORDER 24 ) AS D 25 ) 26 ; 27 /* Update the order as delivered and retrieve the customer id */ 28 SET VAR.C_ID = ( SELECT O_C_ID 29 FROM OLD TABLE ( UPDATE ORDERS 30 SET O_CARRIER_ID = DEL.CARRIER_ID 31 WHERE O_W_ID = DEL.W_ID 32 AND O_D_ID = DEL.D_ID 33 AND O_ID = VAR.O_ID 34 ) AS U 35 ) 36 ; 37 SET VAR.AMOUNT = ( SELECT SUM( OL_AMOUNT ) 38 FROM OLD TABLE ( UPDATE ORDER_LINE 39 SET OL_DELIVERY_D = DEL.DELIVERY_D 40 WHERE OL_W_ID = DEL.W_ID 41 AND OL_D_ID = DEL.D_ID 42 AND OL_O_ID = VAR.O_ID 43 ) AS U 44 ) 45 ; 46 /* Charge the customer */ 47 UPDATE CUSTOMER 48 SET C_BALANCE = C_BALANCE + VAR.AMOUNT 49 , C_DELIVERY_CNT = C_DELIVERY_CNT + 1 50 WHERE C_W_ID = DEL.W_ID 51 AND C_D_ID = DEL.D_ID 52 AND C_ID = VAR.C_ID 53 ; 54 /* Return the order id to the caller (or NULL) */ 55 RETURN VALUES VAR.O_ID ; 56 END |
As in the NEW_ORDER transaction, all steps have been collected into one SQL table function to save code path. Instead of the eight steps, only five substatements are present. What happened?
- The traditional way to process an element on a queue in SQL is to fetch the element in a first step. Then, in a second step, the element is deleted from the table. The main problem with this approach is that, unless a cursor-for-update is opened, a straight select into will not lock the row sufficiently to prevent another carrier from attempting to deliver the same order. Also in either case, cursor or not, two statements need to be executed. Again the ability to push data-change-operations (in this case DELETE) into the FROM clause comes in handy. However, more than this is needed. In order to find the "oldest" order you must execute a MIN() function. This function cannot appear in a DELETE statement's WHERE clause. It is only allowed in queries. The fact that DB2's SQL language is highly orthogonal helps solve the problem in an elegant way. First, DB2 allows DELETE against queries with the same semantics as it allows DELETE against a view (which really are queries). Second, DB2 allows DELETE against an ordered query. Knowing that "ORDER BY O_ID FETCH FIRST ROW ONLY" will serve up the oldest order id results in the rather elegant DELETE operation above.
- Retrieving the customer id and updating the order has been folded into one statement.
- Computing the total value of the order from the individual order lines has been folded together with the update of the delivery column in ORDER_LINES. Again this works only because of the orthoginality of DB2's choice for the SQL language. Because the UPDATE is in the FROM clause it is easy to use the standard SUM() to aggregate the end result.
The actual statement to execute the delivery is rather trivial:
Listing 11. SQL statement to execute delivery
1 SELECT O_ID 2 INTO :no_o_id :no_o_id_indicator 3 FROM TABLE ( DEL( :w_id , :d_id , :o_carrier_id , :ol_delivery_d ) ) AS T |
Again the plan below shows how dense this transaction has become thanks to the efficient usage of SQL:
Listing 12. Delivery access plan
Again, take a quick tour through this plan's key features:
- DELETE(6) is the POP from the queue. DB2 recognized that all it needs is an index fetch. Of course the SQL would have worked with a SORT as well, but that would be a bad plan.
- GRPBY(11) computes the SUM(OL_AMOUNT) right on top of the UPDATE(12).
By now it is assumed that all the other operators are well known to the attentive reader.
There are two variants of the PAYMENT transaction. The first variant is used for customers who provide their customer id. The second variant is used for customers who don't remember their customer ID and who provide their last name instead. Only the second variant is described here, since it provides a challenge where the first does not.
In the payment transaction (by last name) the following steps must occur:
- Retrieve the name and address of the district.
- Find the customer id of the customer by last name. If there is more than one customer with the same last name in this district, the right customer is the "middle" customer with respect to the customer's first name.
- Retrieve the customer's personal information.
- Increment the revenue to date for the district.
- Increment the revenue to date for the warehouse.
- Increment the payments made by the customer including some extra data if the customer has bad credit.
- Log the payment in the history.
As in the previous transactions, most of the logic is encapsulated into a table function named PAY_C_LAST().
Listing 13. Table function PAY_C_LAST
1 CREATE FUNCTION PAY_C_LAST( W_ID INTEGER 2 , D_ID SMALLINT 3 , C_W_ID INTEGER 4 , C_D_ID SMALLINT 5 , C_LAST VARCHAR(16) 6 , H_DATE BIGINT 7 , H_AMOUNT BIGINT 8 , BAD_CREDIT_PREFIX VARCHAR(34) 9 ) 10 RETURNS TABLE( W_STREET_1 CHAR(20) 11 , W_STREET_2 CHAR(20) 12 , W_CITY CHAR(20) 13 , W_STATE CHAR(2) 14 , W_ZIP CHAR(9) 15 , D_STREET_1 CHAR(20) 16 , D_STREET_2 CHAR(20) 17 , D_CITY CHAR(20) 11 , D_STATE CHAR(2) 19 , D_ZIP CHAR(9) 20 , C_ID INTEGER 21 , C_FIRST VARCHAR(16) 22 , C_MIDDLE CHAR(2) 23 , C_STREET_1 VARCHAR(20) 24 , C_STREET_2 VARCHAR(20) 25 , C_CITY VARCHAR(20) 26 , C_STATE CHAR(2) 27 , C_ZIP CHAR(9) 28 , C_PHONE CHAR(16) 29 , C_SINCE BIGINT 30 , C_CREDIT CHAR(2) 31 , C_CREDIT_LIM BIGINT 32 , C_DISCOUNT INTEGER 33 , C_BALANCE BIGINT 34 , C_DATA CHAR(200) 35 ) 36 SPECIFIC PAY_C_ID INHERIT ISOLATION LEVEL WITH LOCK REQUEST 37 MODIFIES SQL DATA DETERMINISTIC NO EXTERNAL ACTION LANGUAGE SQL 38 VAR: BEGIN ATOMIC 39 DECLARE W_NAME CHAR(10) ; 40 DECLARE D_NAME CHAR(10) ; 41 DECLARE W_STREET_1 CHAR(20) ; 42 DECLARE W_STREET_2 CHAR(20) ; 43 DECLARE W_CITY CHAR(20) ; 44 DECLARE W_STATE CHAR(2) ; 45 DECLARE W_ZIP CHAR(9) ; 46 DECLARE D_STREET_1 CHAR(20) ; 47 DECLARE D_STREET_2 CHAR(20) ; 48 DECLARE D_CITY CHAR(20) ; 49 DECLARE D_STATE CHAR(2) ; 50 DECLARE D_ZIP CHAR(9) ; 51 DECLARE C_ID INTEGER ; 52 DECLARE C_FIRST VARCHAR(16) ; 53 DECLARE C_MIDDLE CHAR(2) ; 54 DECLARE C_STREET_1 VARCHAR(20) ; 55 DECLARE C_STREET_2 VARCHAR(20) ; 56 DECLARE C_CITY VARCHAR(20) ; 57 DECLARE C_STATE CHAR(2) ; 58 DECLARE C_ZIP CHAR(9) ; 59 DECLARE C_PHONE CHAR(16) ; 60 DECLARE C_SINCE BIGINT ; 61 DECLARE C_CREDIT CHAR(2) ; 62 DECLARE C_CREDIT_LIM BIGINT ; 63 DECLARE C_DISCOUNT INTEGER ; 64 DECLARE C_BALANCE BIGINT ; 65 DECLARE C_DATA CHAR(200) ; 66 67 /* Update District and retrieve its data */ 68 SET ( D_NAME, D_STREET_1, D_STREET_2, D_CITY, D_STATE, D_ZIP ) 69 = ( SELECT D_NAME, D_STREET_1, D_STREET_2, D_CITY, D_STATE, D_ZIP 70 FROM OLD TABLE ( UPDATE DISTRICT 71 SET D_YTD = D_YTD + PAY_C_ID.H_AMOUNT 72 WHERE D_W_ID = PAY_C_ID.W_ID 73 AND D_ID = PAY_C_ID.D_ID 74 ) AS U 75 ) 76 ; 77 /* Determine the C_ID */ 78 SET ( C_ID ) 79 = ( SELECT C_ID 80 FROM ( SELECT C_ID 81 , COUNT(*) OVER() AS COUNT 82 , ROWNUMBER() OVER (ORDER BY C_FIRST) AS NUM 83 FROM CUSTOMER 84 WHERE C_LAST = PAY_C_LAST.C_LAST 85 AND C_W_ID = PAY_C_LAST.C_W_ID 86 AND C_D_ID = PAY_C_LAST.C_D_ID 87 ) AS T 88 WHERE NUM = (COUNT + 1) / 2 89 ) 90 ; 91 /* Update the customer */ 92 SET ( C_FIRST, C_MIDDLE, C_STREET_1, C_STREET_2 93 , C_CITY, C_STATE, C_ZIP, C_PHONE, C_SINCE, C_CREDIT, C_CREDIT_LIM 94 , C_DISCOUNT, C_BALANCE, C_DATA ) 95 = ( SELECT C_FIRST, C_MIDDLE, C_STREET_1, C_STREET_2 96 , C_CITY, C_STATE, C_ZIP, C_PHONE, C_SINCE, C_CREDIT 97 , C_CREDIT_LIM , C_DISCOUNT, C_BALANCE 98 , CASE WHEN C_CREDIT = 'BC' 99 THEN SUBSTR(C_DATA, 1, 200) END AS C_DATA 100 FROM NEW TABLE ( UPDATE CUSTOMER 101 SET C_BALANCE = C_BALANCE - PAY_C_ID.H_AMOUNT 102 , C_YTD_PAYMENT = C_YTD_PAYMENT + PAY_C_ID.H_AMOUNT 103 , C_PAYMENT_CNT = C_PAYMENT_CNT + 1 104 , C_DATA = CASE WHEN C_CREDIT = 'BC' 105 THEN BAD_CREDIT_PREFIX 106 || SUBSTR( C_DATA, 1, 466 ) 107 ELSE C_DATA 108 END 109 WHERE C_W_ID = PAY_C_ID.C_W_ID 110 AND C_D_ID = PAY_C_ID.C_D_ID 111 AND C_ID = PAY_C_ID.C_ID 112 ) AS U 113 ) 114 ; 115 /* Update the warehouse */ 116 SET ( W_NAME, W_STREET_1, W_STREET_2, W_CITY, W_STATE, W_ZIP ) 117 = ( SELECT W_NAME, W_STREET_1, W_STREET_2, W_CITY, W_STATE, W_ZIP 118 FROM OLD TABLE ( UPDATE WAREHOUSE 119 SET W_YTD = W_YTD + PAY_C_ID.H_AMOUNT 120 WHERE W_ID = PAY_C_ID.W_ID 121 ) AS U 122 ) 123 ; 124 /* Finally insert into the history */ 125 INSERT 126 INTO HISTORY ( H_C_ID, H_C_D_ID, H_C_W_ID, H_D_ID 127 , H_W_ID, H_DATA, H_DATE, H_AMOUNT ) 128 VALUES ( PAY_C_ID.C_ID 129 , PAY_C_ID.C_D_ID 130 , PAY_C_ID.C_W_ID 131 , PAY_C_ID.D_ID 132 , PAY_C_ID.W_ID 133 , VAR.W_NAME || CHAR( ' ', 4 ) || VAR.D_NAME 134 , PAY_C_ID.H_DATE 135 , PAY_C_ID.H_AMOUNT 136 ) 137 ; 138 /* Done - return the collected data */ 139 RETURN VALUES ( W_STREET_1, W_STREET_2, W_CITY, W_STATE, W_ZIP 140 , D_STREET_1, D_STREET_2, D_CITY, D_STATE, D_ZIP 141 , C_ID , C_FIRST, C_MIDDLE, C_STREET_1, C_STREET_2 142 , C_CITY, C_STATE, C_ZIP, C_PHONE, C_SINCE, C_CREDIT 143 , C_CREDIT_LIM , C_DISCOUNT, C_BALANCE, C_DATA 144 ) 145 ; 146 END |
Listing 14. SQL statement for payment transaction
1 SELECT W_STREET_1, W_STREET_2, W_CITY, W_STATE, W_ZIP 2 , D_STREET_1, D_STREET_2, D_CITY, D_STATE, D_ZIP 3 , C_ID, C_FIRST, C_MIDDLE, C_STREET_1, C_STREET_2 4 , C_CITY, C_STATE, C_ZIP, C_PHONE, C_SINCE, C_CREDIT, C_CREDIT_LIM 5 , C_DISCOUNT, C_BALANCE, C_DATA 6 INTO :w_street_1 , :w_street_2 , :w_city , :w_state , :w_zip 7 , :d_street_1 , :d_street_2 , :d_city , :d_state , :d_zip 8 , :c_id , :c_first , :c_middle , :c_street_1 , :c_street_2 , :c_city , :c_state 9 , :c_zip , :c_phone , :c_since , :c_credit , :c_credit_lim 10 , :c_discount , :c_balance, :c_data :c_data_indicator 11 FROM TABLE ( PAY_C_LAST( :w_id 12 , :d_id 13 , :c_w_id 14 , :c_d_id 15 , :c_last_input 16 , :h_date 17 , :h_amount 18 , :c_data_prefix_c_last 19 ) 20 ) AS PAY_C_LAST 21 WITH RR USE AND KEEP UPDATE LOCKS |
On top of the usual optimizations, there are two new techniques to take note of:
- To determine the correct customer, the CUSTOMER table needs to be read. Only after this is done can the update to the CUSTOMER table be performed. By default, this means that all customer rows with the last name in question will acquire a share lock. To perform the update, the lock needs to be transformed into an update lock. There is a small risk that the same customer wants to do another payment for a different order at the same time. If this happens between the fetch and the update, then a dead lock will occur, since neither transaction will be able to acquire the update lock given that the second transaction holds a share lock. To avoid situtations like this, DB2 V8.2 supports the so called lock-request-clause. In this case WITH RR USE AND KEEP UPDATE LOCKS will cause DB2 to collect at least update locks instead of share locks throughout the statement. For semantic cleanliness and future extensibility, the SQL function uses a matching clause INHERIT ISOLATION LEVEL WITH LOCK REQUEST.
- To find the median, the use of ROW_NUMBER() was chosen. This OLAP function numbers all customers of the same last name by their first names. It was further decided to not use a separate query to get the total COUNT. Instead, again OLAP has been used. The trade-off is one between memory consumption to buffer all matching customers - since the total COUNT has to be tagged onto each customer but it is not known until the end - and doing two separate index scans from the customer table. For small numbers of rows and a small row-width, (C_ID, COUNT, NUM) using COUNT(*) OVER() is in fact somewhat better.
Listing 15 shows the plan for the payment transaction.
Listing 15. Access plan for payment transaction
The purpose of the ORDER STATUS query is to allow customers to retrieve information about their orders. As in payment, some customers remember their customer ids, while others don't. Here is the more complex version given a customer who calls in by last name only:
- The customer ID needs to be determined using the same algorithm as in the PAYMENT transaction.
- The customer's full name and account balance is retrieved.
- The latest order of the customer is determined by selecting the highest order id for that customer.
- The ID of the delivery carrier is determined if the order was delivered as well as the date when the order was submitted.
- The delivery date, quantity, total price and delivering warehouse are retrieved for each order line.
Listing 16. Order status function
1 CREATE FUNCTION ORD_C_LAST( W_ID INTEGER 2 , D_ID SMALLINT 3 , C_LAST VARCHAR(16) 4 ) 5 RETURNS TABLE( O_ID INTEGER 6 , O_CARRIER_ID SMALLINT 7 , O_ENTRY_D BIGINT 8 , C_BALANCE BIGINT 9 , C_FIRST VARCHAR(16) 10 , C_MIDDLE CHAR(2) 11 , C_ID INTEGER 12 ) 13 SPECIFIC ORD_C_LAST 14 READS SQL DATA NO EXTERNAL ACTION DETERMINISTIC LANGUAGE SQL 15 VAR: BEGIN ATOMIC 16 DECLARE C_BALANCE BIGINT ; 17 DECLARE C_FIRST VARCHAR(16) ; 18 DECLARE C_MIDDLE CHAR(2) ; 19 DECLARE C_ID INTEGER ; 20 DECLARE O_ID INTEGER; 21 DECLARE O_CARRIER_ID SMALLINT; 22 DECLARE O_ENTRY_D BIGINT; 23 24 /* Retrieve the Customer information */ 25 SET ( C_BALANCE, C_FIRST, C_MIDDLE, C_ID ) 26 = ( SELECT C_BALANCE, C_FIRST, C_MIDDLE , C_ID 27 FROM ( SELECT C_ID 28 , C_BALANCE 29 , C_FIRST 30 , C_MIDDLE 31 , COUNT(*) OVER() AS COUNT 32 , ROWNUMBER() OVER (ORDER BY C_FIRST) AS NUM 33 FROM CUSTOMER 34 WHERE C_W_ID = ORD_C_LAST.W_ID 35 AND C_D_ID = ORD_C_LAST.D_ID 36 AND C_LAST = ORD_C_LAST.C_LAST 37 ) AS V1 38 WHERE NUM = (COUNT + 1) / 2 39 ) 40 ; 41 /* Take advantage of the index to fetch the first row (and hence max(o_id) ) */ 42 SET ( O_ID , O_CARRIER_ID , O_ENTRY_D ) 43 = ( SELECT O_ID 44 , O_CARRIER_ID 45 , O_ENTRY_D 46 FROM ORDERS 47 WHERE O_W_ID = ORD_C_LAST.W_ID 48 AND O_D_ID = ORD_C_LAST.D_ID 49 AND O_C_ID = VAR.C_ID 50 ORDER BY O_ID DESC 51 FETCH FIRST 1 ROW ONLY 52 ) 53 ; 54 RETURN VALUES ( VAR.O_ID 55 , VAR.O_CARRIER_ID 56 , VAR.O_ENTRY_D 57 , VAR.C_BALANCE 58 , VAR.C_FIRST 59 , VAR.C_MIDDLE 60 , VAR.C_ID 61 ) 62 ; 63 END |
Listing 17. Calling the function
1 SELECT O_ID, O_CARRIER_ID, O_ENTRY_D, C_BALANCE, C_FIRST, C_MIDDLE, C_ID 2 INTO :o_id, :o_carrier_id , :o_entry_d , :c_balance, :c_first, :c_middle, :c_id 3 FROM TABLE ( ORD_C_LAST( :w_id 4 , :d_id 5 , :c_last_input 6 ) 7 ) AS ORD_C_LAST |
Listing 18. SQL statement for the order status query
1 SELECT OL_I_ID, OL_SUPPLY_W_ID, OL_QUANTITY, OL_AMOUNT, OL_DELIVERY_D 2 FROM ORDER_LINE 3 WHERE OL_W_ID = :w_id 4 AND OL_D_ID = :d_id 5 AND OL_O_ID = :o_id 6 FOR FETCH ONLY ; |
A lot of the usual techniques to improve performance were applied again. For example, all the steps not involving the order lines have been unified into one SQL table function. Also the "middle customer" is retrieved using OLAP. However, one last time there is something interesting to glean from this query:
Naively one would first determine the maximum order id for the customer and then use this ID to retrieve the carrier and the entry date. However it is denser to unify those two queries into one by exploiting the fact that the maximum order ID for a customer is also the first order ID fetched in an index on the customer and the order id, if the order id is sorted descending. Given a matching index, a single index fetch will pin down the row to be retrieved. This is the same technique used in the delivery transaction, this time without the DELETE and a MAXIMUM.
Note that the order lines are retrieved through a separate cursor. It is more efficient to execute two statements than to return a Cartesian product of the two queries which would send the customer and order information repeated for each order line.
Listing 19 lists the plan showing the efficient use (FETCH(8)) of ORDER BY and FETCH FIRST 1 ROW ONLY discussed above.
Listing 19. Order status query plan
Rows
RETURN
( 1)
Cost
I/O
|
1
NLJOIN
( 2)
12.928
2.008
/----------+---------\
1 1
NLJOIN TBSCAN
( 3) ( 10)
12.9279 4.48727e-005
2.008 0
/---------+--------\ |
1 1 1
TBSCAN UNION TABFNC: SYSIBM
( 4) ( 5) GENROW
4.48727e-005 0.108135
0 0.013056
| /-------+------\
1 1 1
TABFNC: SYSIBM FILTER FETCH
GENROW ( 6) ( 8)
0.0176324 0.0905021
0 0.005056
| /----+---\
0.2 0.005056 79
IXSCAN IXSCAN TABLE: SRIELAU
( 7) ( 9) ORDERS
0.0158334 0.0251716
0 0
| |
5 79
INDEX: SRIELAU INDEX: SRIELAU
CUST_IDXB ORDR_IDXB
|
Last but not least, the STOCK LEVEL query exercises a three-way join to determine the number of items in the past 20 orders for a given district for which the stock levels are below a specified threshhold. There is not much to be said about this query other than the fact that it is the only query that can run with a cursor stability isolation level. DB2 has the capability of specifying the isolation level on a query-by-query basis which has been exploited here.
Listing 20. Stock level query
1 SELECT COUNT( S_I_ID ) INTO :low_stock 2 FROM ( SELECT DISTINCT S_I_ID 3 FROM ORDER_LINE , STOCK , DISTRICT 4 WHERE D_W_ID = :w_id 5 AND D_ID = :d_id 6 AND OL_O_ID < d_next_o_id 7 AND OL_O_ID >= ( d_next_o_id - 20 ) 8 AND OL_W_ID = D_W_ID 9 AND OL_D_ID = D_ID 10 AND S_I_ID = OL_I_ID 11 AND S_W_ID = OL_W_ID 12 AND S_QUANTITY < :threshold 13 ) AS OLS 14 WITH CS |
Listing 21. Stock level query access plan
Rows
RETURN
( 1)
Cost
I/O
|
1
GRPBY
( 2)
13.204
1.02222
|
3.75467e-005
TBSCAN
( 3)
13.2039
1.02222
|
3.75467e-005
SORT
( 4)
13.2033
1.02222
|
3.75467e-005
NLJOIN
( 5)
13.2023
1.02222
/--------------------+--------------------\
0.00782222 0.0048
NLJOIN FETCH
( 6) ( 11)
13.0011 0.201169
1.00782 0.0144
/-----------+-----------\ /----+---\
1 0.00782222 0.0144 9
FETCH FETCH IXSCAN TABLE: SRIELAU
( 7) ( 9) ( 12) STOCK
12.872 0.129119 0.0157274
1 0.00782222 0
/----+---\ /----+---\ |
1 26 0.00782222 44 9
IXSCAN TABLE: SRIELAU IXSCAN TABLE: SRIELAU INDEX: SYSIBM
( 8) DISTRICT ( 10) ORDER_LINE SQL0410231029421
0.0175755 0.0282312
0 0
| |
26 44
INDEX: SYSIBM INDEX: SYSIBM
SQL0410231029415 SQL0410231030088
|
In this article, Rielau has given a brief introduction into the schema of the TPC-C benchmark and its transactions. Much more is needed to execute this benchmark at the extreme levels on which DB2 has performed, but the concise usage of SQL stands at the beginning of it all. Efficient SQL yields efficient query plans, which in turn will mean that only codepath that is necessary will be executed. Only rows that are needed will be read. Rielau considers the SQL used in the TPC-C benchmark by DB2 very near optimal. There is very little fat to cut, if any.
In summary, belw are the interesting SQL features exposed by this implementation of TPC-C:
- The usage of SQL Table Functions allows you to push procedural logic into the FROM clause of a query. Through correlation, SQL table functions allow a more efficient way of implementing iterations, otherwise done with cursors.
- MODIFIES SQL DATA in SQL table functions allows you to push even INSERT, UPDATE, DELETE and MERGE (data-change-operations) into the inner leg of correlated joins.
- Usage of data-change-operations in the FROM clause allows retrieval of generated columns such as IDs, as well as the retrieval of data to be deleted or updated.
- Using queries as the target of a data change operation allows deletion or updates of rows determined by complex SQL including ORDER BY. A common exploitation of this feature is the implementation of POP semantics for queues.
- ORDER BY in combination with FETCH FIRST can be used efficiently to select the maximum or minumum row, including retrieval of columns not part of the aggregate function itself.
- OLAP functions are an option to be considered when selecting rows by conditions other than maximum or minimum.
- Common table expressions (the WITH clause) allow efficient pipelining of data-change-operations .
While the TPC-C benchmark is very simple, it is nonetheless quite efficient in touching many aspects of OLTP processing.
Note that, while DB2 is not the first product to introduce some means of returning data from INSERT, UPDATE and DELETE, it is the first to integrate the concept into SQL itself, allowing immediate relational processing of the result without going through a temporary table and procedural language constructs.
I may not have needed much help to write this article, but it is based on the accomplishments of a whole organization.
I want to thank first and foremost Bobbie Cochrane, Hamid Pirahesh and Richard Sidle from the IBM Almaden Research Center. Without their work in Starburst, inline SQL PL and data-change-operations in the FROM clause could not exist.
I want to thank my team for delivering nested data-change-operations - one of the most exciting SQL extensions of the last years.
Finally I want to thank everyone involved in making the TPC-C results of the past 18 months a reality.
There is only one place to go when IBM wants to deliver the fastest result on their fastest machine: Across the hall!
-
TPC.org homepage. Everything you want to know about the Transaction Processing Council (TPC): Benchmark rules, latest results, members, and so on.
-
TPC-C Specification (PDF).
The TPC-C specification is the official definition on how a TPC-C benchmark is to be perfromed.
My explanation here is truly only a summary.
-
Lots of Starburst resources.
Starburst was the follow-on research project to System R. Its purpose was to develop a highly extensible RDBMS.
The Starburst SQL compiler has been used in DB2 UDB for Linux, UNIX and Windows since DB2 V2 for Common Servers.
-
"Returning Modified Rows" (VLDB 2004 Toronto).
DB2 development's introduction of "data-change-operations in the FROM clause" to the research community.
-
Patent on inline SQL PL design.
This patent describes the details on how inline SQL PL has been implemented into Starburst.
-
Patent on SQL Function design.
Here the concept of how SQL Functions can get efficiently expanded into a query graph are explained.
-
"Profiling SQL Procedures".
This article introduces a GUI tool to tune SQL Procedures. It is complementary to this talk and for those reader in interested in SQL PL beyond SQL functions.
-
"Tracing SQL Procedures".
Again an article which deals with SQL Procedures and how to debug them.
-
DB2 UDB for LUW Information Center.
The Information Center is the online documentation for DB2.

Serge Rielau is a Senior Software Developer, working on DB2 UDB SQL compiler development at the IBM Toronto lab. He has worked in DB2’s SQL compiler development team for 7 years and is an expert in the SQL Language. Among other projects he has developed inline SQL PL and SQL Functions. He also designed what is commonly known as "SELECT FROM INSERT, UPDATE and DELETE". Nowadays Serge spends a good deal of his time helping customers migrate from other DBMS to DB2 and he is an active participant in the comp.databases.ibm-db2 newsgroup.





