Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Programmers only: Utilities for developers

Harness the magic of REORG and LOAD

Bonnie Baker specializes in teaching on-site classes for corporations, agencies, and DB2 user groups. She is an IBM DB2 Gold Consultant, an IBM Information Champion, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her ability to demystify complex concepts through analogies and war stories.

Summary:  Bonnie Baker talks about some of the DB2® utilities such as REORG and LOAD that can save you time. Using them instead of program code can help you to improve system availability and performance. This content is part of the IBM Data Management magazine.

View more content in this series

Date:  21 Oct 2011
Level:  Intermediate

Activity:  6097 views
Comments:  

- Read this article in our interactive digital edition format!
- Subscribe to IBM Data Management magazine

If you're a programmer and you want to manipulate data, you write a program. But in some cases, that may not be the best option. DB2 has several utilities that can save you time—and using them instead of program code can help you improve system availability and performance. Running utilities is usually the responsibility of a DBA, but let's look at the many tasks that utilities can accomplish for you and the many application problems that they can solve.

Why utilities instead of maintenance programs?

Maintenance programs must be coded, tested, and then run. They acquire locks and hopefully do COMMITs to release those locks. However, if set up correctly, utilities do not need to be coded, and testing is minimal; they basically just need to be run. They acquire very few or no locks and are finely tuned to use as little CPU overhead as possible. Plus, utilities often can be run simultaneously with programs—meaning no outages for your users.


Purging data using REORG

Have you ever written a program to purge (DELETE) data from a table? Such a program will often run for hours, causing locking issues (time-outs) for other users. And at the end of the job, even though the table has fewer rows, it still takes up the same disk allocation and has just as many pages—only now the pages are littered with random chunks of free space. The program has eliminated rows but a REORG must be done to unload and reload the rows and consolidate them onto fewer pages, cleaning up the mess the program left behind.

Here's the good news: the REORG utility actually has a DISCARD feature that can be used to purge rows. This feature allows rows to be kept or discarded, and the discarded rows can be written to a sequential data set. That data set can then be input into a program, into the LOAD utility, or used as a backup. The utility takes virtually no locks and can be run simultaneously with programs that are running SQL. Using REORG to purge the rows saves coding time, reduces CPU usage, and eliminates the locking overhead of running a program. Instead of writing, testing, and running a program, we are using REORG to do both the purge and the clean up!


What exactly is this magic REORG?

REORG is a utility that unloads the rows in your table and reloads all of those rows or some of those rows, usually with a sort in between. The optional sort can be used to put the rows into the optimal order for SQL performance. This order is called the cluster order. And after the unload (or unload discard) phase, while the data set is empty, the REORG utility can either reset the high-used relative byte address (HURBA) or delete the underlying data set and redefine or reallocate its space before the reload.

The REORG utility can be used in many ways beyond purging data. It can also help implement many of the performance features that are part of DB2. For example, DB2 allows us to compress the rows on our tables. Once we alter the table space to enable compression, we can run REORG to allow DB2 to reformat the rows, compressing them to potentially half their original length. When the rows are reloaded, they are shorter, so more of them can fit on a page. This is terrific performance-wise for multi-row, sequential reads because the all-important page is our unit of I/O.

Just think: the more rows per page, the fewer pages in our table. This not only saves disk space, but also reduces the number of GETPAGEs and read I/Os that our SQL must do.


Add data with LOAD

When many rows must be added to a table, the obvious solution is INSERT. But massive numbers of INSERTs means writing and running a program with the potential for high CPU overhead as well as locking issues. And while INSERT attempts to keep the table in order, it often just cannot make that happen. You'll likely need to do a REORG afterward to correctly incorporate the new rows into the table in the correct (cluster) order.

Instead, we can use the LOAD utility, which offers a far more efficient means of adding rows to DB2 tables. Using a sequential file for input, LOAD can add rows to a brand-new empty table or to an already-populated table. It can also replace all rows in the table with totally new data. The utility can be run in batch or, for adding rows to an existing table, can be run simultaneously with other programs that are using SQL. Once you add this utility to your arsenal of tools, you will start seeing many opportunities to use it.


LOAD instead of UPDATE?

Although LOAD and UPDATE are not identical, they do have a symbiotic relationship. For example, when a huge percentage of the rows in a large table must be updated, a maintenance program is often the solution. But we know that rows expand, contract, and move to different pages any time there are extensive column data changes. Further, if the clustering columns have been updated, the rows may no longer be in cluster order. Nine times out of ten, a REORG is scheduled after a huge maintenance program is run. Remember, REORG is basically an unload and a reload of data.

We can use the LOAD utility to accomplish our update goals more efficiently. We read (SELECT) all rows in cluster order into a program using lock-avoidance techniques like ISOLATION(CS) and CURRENTDATA(NO). The program then changes the column data (usually by reading a sorted transaction file in the same order as the table data) and writes both the changed and unchanged rows to a sequential file. The program is essentially unloading the rows and changing the ones that need to be changed in the process. Since the program is only reading the DB2 table, locking issues will be minimal or non-existent. After the program finishes, the sequential file can be sorted if necessary. (This sort might be needed if the column data on which the table is clustered is changed.) The output sequential file can then be used as input to the LOAD utility.

Now think about this: we unloaded the rows, changed them, sorted them (possibly), and then reloaded them. In effect, we did our own REORG. Although this is not always a feasible solution, it has many advantages when the conditions are right.

One of the other wonderful benefits of the LOAD utility is that it can act like INSERT but be smarter and more efficient. Both INSERT and LOAD (run with SHRLEVEL CHANGE to allow other users simultaneous access to the table) honor the desire to keep the table rows in a specified cluster order. However, the LOAD utility uses less CPU overhead than INSERT and does intelligent COMMITs with the least-possible contention with other table users.


Other benefits of utilities

Utilities provide creative solutions to many difficult problems. For example, tables can now be set up with an option called APPEND that causes INSERT to place rows at the end of the table, rather than attempt to keep them in cluster order. This is perfect for tables that have a large number of rows added followed by a "clean up the mess" REORG.

But we can also use this strategy to improve slow-running, contentious INSERT programs. You can enhance performance by altering the table to use this APPEND option. Putting the rows at the end of the table is far easier and less costly than attempting to honor your cluster order. CPU usage, GETPAGEs, and read I/Os will be reduced. At a later time, REORG can be used to put the rows back in cluster order.

REORG also helps us implement another performance feature of DB2. Variable character data is historically padded to its maximum length in an index. Today, indexes containing variable character column(s) can be altered so that those columns are no longer padded. Rather, the length of the columns in the index will match the column length in the table. To implement and reap the benefits of the option, the index must be rebuilt. Table-space REORGs rebuild the indexes associated with the table, thereby implementing the unpadding.


Utilities provide options

When examining existing maintenance programs or considering writing new maintenance programs, do stop and consider whether a utility might just take care of the problem. Treat utilities as an option whenever massive maintenance is needed. Think above and beyond the "write yet another maintenance program" solution.

Sponsored Article
The Internet is Your Oyster Safeguarding the Smart Grid with a Tactical Appliance Does Your Storage Have the Power to Support Mixed Workloads?
IBM, Intel Post Top Results for SAP Transaction BankingIBM DB2 Advanced Enterprise Server EditionRiding the Open Social-Content Wave
Critical Modeling Strategies for Insurance CompaniesVirtualized Business Intelligence Levels the Playing Field for Small and Midsize CompaniesGrace Under Pressure: ENOVIA V6 PLM Redefines Peak Workload Performance on DB2
DB2 is Pure Power for Growing BusinessesIBM Champions Connection
Partner Resources
Advent Global Solutions, Inc.Applied Analytix, Inc.ASG Software Solutions
BMCCogitoDassault Systèmes
Daeja Image SystemsDBIFuzzy Logix
Melissa DataNECNetezza
QueBIT Quest SoftwareRelational Architects International
Safari Books Online

Resources

About the author

Bonnie Baker specializes in teaching on-site classes for corporations, agencies, and DB2 user groups. She is an IBM DB2 Gold Consultant, an IBM Information Champion, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her ability to demystify complex concepts through analogies and war stories.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=766780
ArticleTitle=Programmers only: Utilities for developers
publish-date=10212011

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers