Read this article in our interactive digital edition format!
In my previous column, I wrote about unnecessary SQL. The goal was to make programmers stop and think about each statement and make sure that it was both absolutely necessary and absolutely necessary at that specific time. I gave examples of SQL that could be eliminated completely and SQL that could be deferred until a later time, until the last possible moment, and then executed only if still necessary.
In this column, I will continue with examples of SQL that may be executed far too many times and some that should be replaced with newer, better-performing SQL. So, welcome to part two of a three-part series on eliminating or reducing conversations with IBM DB2.
One of the fundamental rules for performance that we should obey when writing programs, especially batch programs, is to use control break logic whenever possible. This is not exclusively a DB2 concept—it's just good programming practice. Checking for a control break, or a change in a value, reduces connections to DB2 because you look up a value in a table only if the value you are worried about is different from the last value that you looked up.
Optimal control break logic requires that the input data be in a specific order; for example, ITEM within STORE within REGION. But even if the input data is not sorted in a specific order, break logic can still be used, ensuring that any SQL that is required for a REGION is performed only once for each REGION; for a STORE, only once for each STORE within that REGION; and for an ITEM, only once for each ITEM within that STORE.
As a side note, DB2-enforced referential integrity does not use control break logic. Foreign-key values are checked on every INSERT, UPDATE, DELETE, and MERGE, even if the value of the column at issue is the same as the value in the row INSERTed (and so forth) before. With program-enforced referential integrity, we have the option of doing the validation only when the value of the column at issue is different from the value in the prior row; that is, on a control break.
Preloading small reference tables into program memory
Another fundamental rule for performance is to preload small reference tables into working storage tables (when it is smart to do so) to avoid connecting to DB2 an inordinate number of times.
A less-than-amiable person once asked me, "What kind of crummy product makes it necessary to load values into working storage to avoid connecting to the product?" I thought for a minute and said, "The kind of product that recognizes that a programmer has choices and a brain and understands that avoiding unnecessary calls, connects, and get pages is a good idea." Just because DB2 is fast and has buffer pools to reduce actual I/O does not mean that connecting to DB2 to read a 10-row table a million times is smarter than connecting to DB2 once, reading 10 rows into a working storage table, and then addressing program memory a million times.
Reduce repetitive executions of subselects
Consider this SQL:
Select col1, col2, col3
From big_table where item = :hv-item-just-read
And big_table.deptno in (select deptno from little_dept-table
Where division = Eastern) |
For each item on an input sequential file, the program logic requires a lookup in a big table to get the associated values for col1, col2, and col3. Hopefully, as recommended earlier, the input data set has been sorted by item number and the program is checking to see if this item number is different from the prior number before this lookup is done.
But look more closely at the SQL. The subselect is creating a list of departments that are in the Eastern division and then making sure that the department at issue is in that list. I'm sure that, with a wee bit of thought, we could create that "departments in the Eastern division" list once and then do our lookup without connecting to DB2. For example, we can use a V8 rowset-positioned cursor to connect to DB2 one time and fetch our list; then we change the subject SQL to a hard-coded in-list built from our array:
Declare CursorDept with rowset positioning for Select deptno from little_dept-table Where division = Eastern) Fetch next rowset from CursorDept For 20 rows Into :hvarray-deptno (code to ensure that +100 was reached) Select col1, col2, col3 From big_table where item = :hv-item-just-read And big_table.deptno in (:hvdept1, :hvdept2, :hvdept3....:hvdept20) |
Remember, if there are fewer than 20 department numbers (say, 15) and you do not want to code dynamic SQL, you can always perpetuate the last value in host variables 16 through 20.
Reading a row before update/delete—fetch/update/fetch/update
More-experienced programmers must overcome some of their entrenched practices. In the "old days," we had to read a record to update or delete it. With DB2, if you have no need to establish a before image of the row, you do not need to read the row to update or delete it. I often see batch programs that use a cursor to fetch, update, fetch, update, fetch, update..., when they could just do a reasonably sized relational SET UPDATE using the same Where clause that is in the DECLARE CURSOR. Instead of:
Declare CursorUpd for
Select col1, col2, ...
From tableA
Where jobcode = :hvj for update of salary
Fetch CursorUpd into :hvcol1, :hvcol2, ...
Update salary set salary = salary + 1000.00 where current of CursorUpd
|
and repeating the fetch, update, fetch, update again and again, why not just:
Update tableA Set salary = salary + 1000.00 Where jobcode = :hvj |
SET processing is one of the huge strengths of relational design. You must, of course, ensure that the SET is a size that does not create an unacceptable unit of recovery or a problem with lock escalation, or cause locking issues (such as timeouts) for other concurrently running programs.
I sometimes see the preceding scenario (fetch, update) compounded by a subsequent singleton SELECT of the updated row—read it before, update it, look at it after. With newer programming techniques, as well as the realization that we do not need to see the before image, we can DECLARE a row-positioned cursor to SELECT from our SET UPDATE and then FETCH a rowset of reasonable size (say 100 rows at a time) to see the result of the maintenance.
Single-row fetches are becoming rare. Why connect 100 times when you can connect once and see the same 100 rows?
In the next issue, I will continue with the final installment of this topic, and we will look at even more (and newer) techniques for avoiding or reducing connects to DB2.
Bonnie Baker specializes in teaching on-site classes for corporations, agencies, and DB2 user groups. She is an IBM DB2 Gold Consultant, an IBM Information Champion, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her ability to demystify complex concepts through analogies and war stories.




