- Read this article in our interactive digital edition format!
- Subscribe to
IBM Data Management magazine
If you're a programmer and you want to manipulate data, you write a program. But in some cases, that may not be the best option. DB2 has several utilities that can save you time—and using them instead of program code can help you improve system availability and performance. Running utilities is usually the responsibility of a DBA, but let's look at the many tasks that utilities can accomplish for you and the many application problems that they can solve.
Why utilities instead of maintenance programs?
Maintenance programs must be coded, tested, and then run. They acquire locks and
hopefully do COMMITs to release those locks. However, if
set up correctly, utilities do not need to be coded, and testing is minimal; they
basically just need to be run. They acquire very few or no locks and are finely
tuned to use as little CPU overhead as possible. Plus, utilities often can be run
simultaneously with programs—meaning no outages for your users.
Have you ever written a program to purge (DELETE) data
from a table? Such a program will often run for hours, causing locking issues
(time-outs) for other users. And at the end of the job, even though the table has
fewer rows, it still takes up the same disk allocation and has just as many
pages—only now the pages are littered with random chunks of free space. The program
has eliminated rows but a REORG must be done to unload
and reload the rows and consolidate them onto fewer pages, cleaning up the mess the
program left behind.
Here's the good news: the REORG utility actually has a
DISCARD feature that can be used to purge rows. This
feature allows rows to be kept or discarded, and the discarded rows can be written
to a sequential data set. That data set can then be input into a program, into the
LOAD utility, or used as a backup. The utility takes
virtually no locks and can be run simultaneously with programs that are running SQL.
Using REORG to purge the rows saves coding time, reduces CPU usage, and eliminates
the locking overhead of running a program. Instead of writing, testing, and running
a program, we are using REORG to do both the purge and
the clean up!
What exactly is this magic REORG?
REORG is a utility that unloads the rows in your table and reloads all of those rows
or some of those rows, usually with a sort in between. The optional sort can be used
to put the rows into the optimal order for SQL performance. This order is called the
cluster order. And after the unload (or unload discard) phase, while the data set is
empty, the REORG utility can either reset the high-used
relative byte address (HURBA) or delete the underlying data set and redefine or
reallocate its space before the reload.
The REORG utility can be used in many ways beyond purging
data. It can also help implement many of the performance features that are part of
DB2. For example, DB2 allows us to compress the rows on our tables. Once we alter
the table space to enable compression, we can run REORG
to allow DB2 to reformat the rows, compressing them to potentially half their
original length. When the rows are reloaded, they are shorter, so more of them can
fit on a page. This is terrific performance-wise for multi-row, sequential reads
because the all-important page is our unit of I/O.
Just think: the more rows per page, the fewer pages in our table. This not only saves
disk space, but also reduces the number of GETPAGEs and
read I/Os that our SQL must do.
When many rows must be added to a table, the obvious solution is INSERT. But massive numbers of INSERTs means
writing and running a program with the potential for high CPU overhead as well as
locking issues. And while INSERT attempts to keep the
table in order, it often just cannot make that happen. You'll likely need to do a
REORG afterward to correctly incorporate the new rows
into the table in the correct (cluster) order.
Instead, we can use the LOAD utility, which offers a far
more efficient means of adding rows to DB2 tables. Using a sequential file for
input, LOAD can add rows to a brand-new empty table or to
an already-populated table. It can also replace all rows in the table with totally
new data. The utility can be run in batch or, for adding rows to an existing table,
can be run simultaneously with other programs that are using SQL. Once you add this
utility to your arsenal of tools, you will start seeing many opportunities to use
it.
Although LOAD and UPDATE are
not identical, they do have a symbiotic relationship. For example, when a huge
percentage of the rows in a large table must be updated, a maintenance program is
often the solution. But we know that rows expand, contract, and move to different
pages any time there are extensive column data changes. Further, if the clustering
columns have been updated, the rows may no longer be in cluster order. Nine times
out of ten, a REORG is scheduled after a huge maintenance
program is run. Remember, REORG is basically an unload
and a reload of data.
We can use the LOAD utility to accomplish our update goals
more efficiently. We read (SELECT) all rows in cluster
order into a program using lock-avoidance techniques like ISOLATION(CS) and CURRENTDATA(NO). The
program then changes the column data (usually by reading a sorted transaction file
in the same order as the table data) and writes both the changed and unchanged rows
to a sequential file. The program is essentially unloading the rows and changing the
ones that need to be changed in the process. Since the program is only reading the
DB2 table, locking issues will be minimal or non-existent. After the program
finishes, the sequential file can be sorted if necessary. (This sort might be needed
if the column data on which the table is clustered is changed.) The output
sequential file can then be used as input to the LOAD
utility.
Now think about this: we unloaded the rows, changed them, sorted them (possibly),
and then reloaded them. In effect, we did our own REORG.
Although this is not always a feasible solution, it has many advantages when the
conditions are right.
One of the other wonderful benefits of the LOAD utility is
that it can act like INSERT but be smarter and more
efficient. Both INSERT and LOAD (run with SHRLEVEL CHANGE to allow other users simultaneous access
to the table) honor the desire to keep the table rows in a specified cluster order.
However, the LOAD utility uses less CPU overhead than
INSERT and does intelligent COMMITs with the least-possible contention with other table users.
Utilities provide creative solutions to many difficult problems. For example, tables
can now be set up with an option called APPEND that
causes INSERT to place rows at the end of the table,
rather than attempt to keep them in cluster order. This is perfect for tables that
have a large number of rows added followed by a "clean up the mess" REORG.
But we can also use this strategy to improve slow-running, contentious INSERT programs. You can enhance performance by altering
the table to use this APPEND option. Putting the rows at
the end of the table is far easier and less costly than attempting to honor your
cluster order. CPU usage, GETPAGEs, and read I/Os will be
reduced. At a later time, REORG can be used to put the
rows back in cluster order.
REORG also helps us implement another performance feature
of DB2. Variable character data is historically padded to its maximum length in an
index. Today, indexes containing variable character column(s) can be altered so that
those columns are no longer padded. Rather, the length of the columns in the index
will match the column length in the table. To implement and reap the benefits of the
option, the index must be rebuilt. Table-space REORGs
rebuild the indexes associated with the table, thereby implementing the unpadding.
When examining existing maintenance programs or considering writing new maintenance programs, do stop and consider whether a utility might just take care of the problem. Treat utilities as an option whenever massive maintenance is needed. Think above and beyond the "write yet another maintenance program" solution.
- Read more articles in
IBM Data Management magazine.
- Learn more about DB2 for z/OS at the
DB2 for z/OS page on developerWorks.
- Learn more about the DB2 for z/OS utilities in the
DB2 for z/OS Information Center.
Bonnie Baker specializes in teaching on-site classes for corporations, agencies, and DB2 user groups. She is an IBM DB2 Gold Consultant, an IBM Information Champion, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her ability to demystify complex concepts through analogies and war stories.




