Read this article in our interactive digital edition format!
In my two previous columns, I wrote about unnecessary SQL and gave examples of SQL that is executed far too many times and SQL that should be replaced with newer, better-performing SQL. In this column, I will tell you about more situations where SQL can be deferred, combined, or avoided completely, all with the goal of reducing conversations with DB2.
Three steps to remove unnecessary re-reads
Screen-display applications are notorious for unnecessary re-reads. We read to get an image to display; the user changes values on the screen and hits a function key to update; the program again takes control and re-reads the row (usually this time with a CURSOR using FOR UPDATE OF). After the FETCH, the program does "before and after" checking to see if anything on the row was changed by a different user while our user was staring at the screen display.
If the row is the same as the originally displayed row, our user's UPDATE is done. If any column on the row has changed, the appropriate information/error is provided to the user on the returned display.
The solution for this unnecessary SQL problem is to use an "optimistic locking" technique. We are confident that most, if not all, of the time, our row will not change during the screen display. Our UPDATE glass is half full, not half empty. Therefore, we will do our UPDATE without re-reading the row. But we are neither foolish nor stupid; we will use a tried-and-true technique to ensure that our UPDATE is done to an unchanged row: we include in our WHERE clause one or more predicates that test for change. We have many options to use to test for change, some allowing more throughput than others. For example, we can:
- Design our table to include a DB2-maintained
ROW UPDATE TIMESTAMP(new in DB2 9), and include a predicate in ourWHEREclause to check that column to see if its value is the same as the originally read column value. - Compare a user-maintained last-update timestamp to its original value (and reset that timestamp in our
SETclause) to make sure that nothing on the row has changed since we received our image of the row. - Include
WHEREclause predicates for every column on the row to see if each column value is the same as its original value. - Include
WHEREclause predicates for only the columns that are important and relevant to us. We definitely want to make sure that the row still qualifies, and we may want to verify that columns we selected have not changed.
Major changes to improve performance
Say you have an index on TRANDATE, PROCESS-DATE, and CUSTNO. You want to FETCH rows and write to a report all rows that will be deleted if they are: older than three months, inactive, and have a CUSTNO in a list found in your program's working storage.
You can:
Declare Cursor csr-delete-old-rows for
Select ponbr, custno, trandate, process-date
From bigtable
Where trandate < current date - 3 months
And status = :hv-inactive
Open csr-delete-old-rows
Fetch csr-delete-old-rows into
:hvponbr, :hvcustno, :hvtrandate, :hvprocess-date
Write to a report
Delete from bigtable
Where current of csr-delete-old-rows
|
If there are 3,000 index rows pointing to table rows that have a TRANDATE older than three months, but only 500 of those 3,000 table rows have a STATUS that is "inactive," then the worst-case scenario is that you will:
CONNECTto DB2 once for theOPENto doCURSORsetupCONNECTto DB2 for eachFETCH. Then, for each of the 3,000 connects:
a. Do multipleGET PAGErequests for the pages in the multi-level index tree
b. Read an average of six index entries
c. For each of the six indexRIDs, do aGET PAGErequest to the table, followed by a potential synchronous wait on I/O
d. After theGET PAGErequests (and read I/Os) to the table, apply theSTATUSpredicate to the six table rows, reject five and accept one- Return a row to the program
- Of the 500 qualified rows, only 25 of the
CUSTNOs are in the working storage list (I call conditions that reject rows in your program after the return from DB2 "Stage 3 predicates"—and Stage 3 predicates are far worse than Stage 2 predicates) CONNECTto DB2 again for each fully qualified row, this time to do theDELETE(25 connects)- Repeat 24 times for a grand total of 528
CONNECTs and way too manyGET PAGEs to both the index and the table
Now what can we do to reduce the number of conversations in this situation? I suggest four improvements (you will remember the advice in step 2 from the last column):
- If feasible, add
STATUSto the index to make the qualification index-only - Change the
CURSORto useROWSET POSITIONINGto read more than one row at a time - Use
SELECT FROM DELETEto see the rows - Get rid of the
STAGE 3disqualification (the program check ofCUSTNO) byINSERTing the numbers into aCREATED GLOBAL TEMPORARY TABLE(e.g.,CTT_CUST) and by using that temporary table in your SQL
Declare Cursor csr-select-and-delete-old-rows With rowset positioning For Select ponbr, custno, trandate, process-date From final_table (Delete from bigtable Where trandate < current date - 3 months And status = :hv-inactive And custno in (select custno from ctt_cust) Open csr-select-and-delete-old-rows |
(One CONNECT)
Fetch rowset from final_table For 100 rows into :hvponbr-array, :hvcustno-array, :hvtrandate-array, :hvprocess-date-array |
(One CONNECT with a +100 SQLCODE
returned for row 26)
Close csr-select-and-delete-old-rows |
(One CONNECT)
The new best-case scenario is that you will:
CONNECTto DB2 once for theOPENto doCURSORsetupCONNECTto DB2 for the oneFETCHand
a. DoGET PAGErequests for the pages in the index tree
b. Read 25 index entries, fully qualifying the rows before reading the table
c. For each of the 25RIDs, do aGET PAGErequest to the table (if we are lucky and some of our rows share the same pages, we will do fewerGET PAGEs)
d. For eachGET PAGEwe may have to do aREAD I/O
e. As each row is found,DELETEit
f. Insert theSELECTed columns of each row into final_table
g. Return 25 rows to the program- Write to the report:
Close csr-select-and-delete-old-rows |
(One CONNECT)
Leveraging the preceeding four techniques reduced the conversations from 528 to 3 while lowering the number of table GET PAGE requests from 3,000 to at most 25.
I'd like to elaborate on the fourth suggested improvement (Get rid of the STAGE 3 disqualification). One of the most common coding missteps I see as a consultant is the practice of validating and rejecting rows in the program instead of in DB2.
These STAGE 3 predicates, especially those that reject a
majority of the rows, are usually the result of old VSAM logic that did not utilize a
WHERE clause. Back then, we used keyed reads and then
applied program-coded IF/THEN/ELSE logic to the records.
IF/THEN/ELSE logic is fertile ground for figuring out how to use WHERE clause predicates so rows can be rejected sooner in DB2's address space.
This is your conscience speaking
I wrote this three-part series to emphasize the importance of totally eliminating or deferring—until the very last moment—conversations with DB2. In the future, as you code each SQL statement, I hope a little Bonnie will be sitting on your shoulder, shaking my finger and asking, "Is this necessary? Can you do this read/maintenance to more than one row at a time? Can you defer this SQL to a later time, until you are absolutely sure that it needs to be done?"
Bonnie Baker specializes in teaching on-site classes for corporations, agencies, and DB2 user groups. She is an IBM DB2 Gold Consultant, an IBM Information Champion, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her ability to demystify complex concepts through analogies and war stories.




