Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Does REORG Matter?

How much disorganization can your DB2 for z/OS databases stand?

Cameron Crotty (editor@tdagroup.com), Editor, TDA Group
Cameron Crotty (editor@tdagroup.com) is editor of IBM Data Management magazine.
Jeffrey Berger, DB2 Mainframe Performance Analyst, IBM
Jeffrey Berger is a DB2 mainframe performance analyst at IBM, specializing in I/O and storage.

Summary:  With new technology like solid-state drives, can you eliminate the need for REORG with your DB2® for z/OS® data? In this article, take a look at the pros and cons, and examine the reasons why you will probably need to still have a plan for regular reorganizations. This content is part of the IBM Data Management magazine.

Date:  21 Oct 2011
Level:  Introductory

Activity:  9597 views
Comments:  

- Read this article in our interactive digital edition format!
- Subscribe to IBM Data Management magazine

Disorganization permeates databases, and the consequences can be unpleasant and expensive. Disorganized data takes longer to retrieve than organized data, and disorganized indexes slow searches to a crawl. To any enterprise running DB2 for z/OS, disorganization has always been a dangerous, resource-sapping enemy to be fiercely battled with good schema design and regular use of the REORG utility.

However, the arrival of solid-state drives (SSDs) kicked off a debate about the true cost of disorganization. Some argue that the performance and time costs of retrieving disorganized data are artifacts of the way that hard disk drive (HDD) technology works (see “Solid-State Drives: Changing the Data World,” IBM Data Management, Issue 3, 2011. With SSDs as a primary storage medium, using REORG to organize or cluster data might seem a waste of scarce system resources. Furthermore, REORG makes it harder to exploit storage tiering solutions such as IBM System Storage Easy Tier, which are designed to make optimal use of a minimal amount of SSD storage.

One thing that everyone agrees on: REORG is expensive, and nobody wants to use it more than absolutely necessary. In fact, IBM has made a number of changes to both DB2 10 and IBM System Storage DS8000 software to help you avoid using REORG. But can we just cast it away? Probably not.

Can fast storage save the day?

Clustering physically groups related rows together on the storage medium so that as much of it as possible can be read in a single I/O pass. Inserting new rows erodes the clustering organization (and degrades performance) until it can be reestablished with REORG.

Clustering can improve performance on an HDD, where data bits are stored on spinning platters and retrieved with a moving disk head. But SSDs have no moving parts, and their data retrieval latencies can be an order of magnitude smaller than those of HDDs. Although a DBMS may need more synchronous I/O passes to retrieve unclustered data scattered across an SSD, the perception is that the actual I/Os are so fast that it doesn't matter.

In practice, there are a couple of holes in this SSD theory. First, even with SSDs minimizing I/O response time, additional synchronous I/Os add CPU costs. Second, the threshold at which I/Os are so fast that the elapsed time “doesn't matter” can vary dramatically, even between database operations.


Disorganized indexes

Beyond these issues, REORG helps boost performance by keeping indexes organized—disorganized indexes take longer to scan than organized ones, chewing up time and CPU resources. Recently, IBM made some huge strides on both software and hardware technology fronts that further improve performance on disorganized indexes.

Other problems of disorganized data

Indirect references and pseudo-deleted RIDs are other types of troublesome data disorganization. Reading a row—even in a table scan—with an indirect reference requires an extra GETPAGE and can add an extra synchronous I/O. List prefetch cannot resolve the issue of indirect references because it uses the list of RIDs from the index, which contains only the original RID locations. It does not know that the row has been relocated until the data row is accessed. SSD mitigates the effect of indirect references, but REORG eliminates them.

Pseudo-deleted RIDs appear when DB2 deletes a row. The index entry is marked as pseudo-deleted, but not physically deleted. Updates also cause pseudo-deleted RIDs because an index update is actually a delete followed by insert.

For the most part, pseudo-deleted RIDs stay in the index until a REORG removes them. In the meantime, they take up storage space and increase the CPU cost of index scans.

DB2 10 helps to address pseudo-deleted RIDs but not indirect references. You will need a different strategy to avoid using REORG to solve these problems.

DB2 9 uses synchronous I/Os to scan a disorganized index, with I/O throughput on FICON (the IBM Fibre Channel protocol used by System z) and HDDs typically ranging between 1 and 6 MB/sec. By contrast, DB2 10 asynchronously uses list prefetch and makes the prefetch service request blocks (SRBs) eligible for the IBM System z Integrated Information Processor (zIIP), which saves CPU cost. Because they are asynchronous, the I/Os can overlap the CPU time, making the scan faster. IBM testing has shown that DB2 10 can scan a disorganized index as much as 5.6 times faster than DB2 9.

On the hardware side, the R6.2 release of DS8000 storage also helps DB2 handle disorganized indexes. To date, list prefetch over FICON has been notoriously slow. But with R6.2, list prefetch I/Os can use High Performance FICON for System z (zHPF), which cuts channel time by more than 50 percent. List prefetch also gets a boost from a new R6.2 feature called Turbo List Prefetch (TLP), which builds on the DS8000 adaptive caching algorithms to reduce cache hits. Together, zHPF and TLP can more than triple the throughput of list prefetch I/O.

With DB2 10 using TLP, the disorganized index scans can be done at throughputs higher than 40 MB/sec. This throughput is often sufficient to avoid any I/O suspension, which means that the application isn't waiting for the I/O to complete—the storage unit is no longer the performance bottleneck. The I/O throughput for an organized index scan with the same hardware will be nearly 300 MB/sec with HDD and 400 MB/sec with SSD, but if there is no I/O suspension to begin with, then the faster I/O won't affect the elapsed time.

Should these fast throughputs change how we think about REORG? If you're using it on a table space, not necessarily, because that process automatically reorganizes the indexes. But if you've been using REORG INDEX to deal with performance problems caused by disorganized indexes, then DB2 10, TLP, and SSD may make your life simpler. That combination can reduce the I/O time to read leaf pages below the CPU time to process those pages, which means that REORG INDEX won't accelerate the index scan. However, you may still want to use REORG INDEX to clean up pseudo-deleted index entries (see sidebar, “Other problems of disorganized data”) and to reclaim free space.


Skip-sequential access

Now, let's look at how REORG affects I/O to the data itself when an index is used to determine which data rows are read. The cluster ratio of the index indicates the physical order of the rows with respect to the collating sequence of the index keys. If the cluster ratio is high, DB2 will use dynamic prefetch; if the ratio is low, DB2 will use list prefetch. In some cases, REORG will increase the cluster ratio of an index, improving performance by enabling the optimizer to use list prefetch. But unless the index in question is the cluster index (of which there is only one per table), there is no guarantee that REORG will increase the cluster ratio.

Even if the cluster ratio of an index is high, the data it refers to may still be spread across the table. This will make the GETPAGEs skip sequential and diminish or eliminate the effectiveness of dynamic prefetch. If the qualified pages are sparse, then the query will do synchronous I/O, which will be slower than if the query had used list prefetch. If the GETPAGEs are mostly within 128 KB of each other, then the query will do sequential I/O, but it will collect a lot of (useless) intermediate pages. (Note: Reading extra pages can accelerate I/O in some cases, but when using TLP it's always better to read only the pages needed for the GETPAGE.)

If you're using SSD, the distance between the GETPAGEs may not matter. When considering the value of REORG, however, that's something of a moot point, because REORG doesn't always affect the distance of the pages anyway. On the other hand, if REORG can reduce the number of GETPAGEs done by the query, then REORG has value no matter the device type.

In the past, using list prefetch was sometimes a cause for concern, as it forced DB2 to build sorted record ID (RID) lists and store them in the RID pool. Once the RID pool was exhausted, DB2 would fall back to an incredibly time-consuming table scan (many a query suffered "death by table scan").

DB2 10 helps solve this problem by overflowing the RID pool into a work file. Work files are stored in a DB2 buffer pool, which may overflow to a direct access storage device (DASD). A RID pool may also overflow to DASD—a process known as paging—but buffer pool I/O is much faster than paging, and it is generally much faster than the concurrent list prefetch I/O to the data because each 32 K RID block contains more than 6000 RIDs.

The optimizer is unlikely to choose list prefetch if the cluster ratio is very high. If you want list prefetch to be used, wait until the data is disorganized before you run RUNSTATS.


The bottom line

In the end, there are ways to reduce the number of times that you must use REORG, but you probably won't be able to avoid it completely. When you do need to use it, don't forget to use DSNACCOX to determine eligibility for REORG scheduling. The default parameters are a guide that you can alter based on your implementation of SSD and/or TLP.

We can't completely eliminate disorganization in our databases, but at least there are tools that make disorganization less disruptive and easier to manage.

Sponsored Article
The Internet is Your Oyster Safeguarding the Smart Grid with a Tactical Appliance Does Your Storage Have the Power to Support Mixed Workloads?
IBM, Intel Post Top Results for SAP Transaction BankingIBM DB2 Advanced Enterprise Server EditionRiding the Open Social-Content Wave
Critical Modeling Strategies for Insurance CompaniesVirtualized Business Intelligence Levels the Playing Field for Small and Midsize CompaniesGrace Under Pressure: ENOVIA V6 PLM Redefines Peak Workload Performance on DB2
DB2 is Pure Power for Growing BusinessesIBM Champions Connection
Partner Resources
Advent Global Solutions, Inc.Applied Analytix, Inc.ASG Software Solutions
BMCCogitoDassault Systèmes
Daeja Image SystemsDBIFuzzy Logix
Melissa DataNECNetezza
QueBIT Quest SoftwareRelational Architects International
Safari Books Online

Resources

About the authors

Cameron Crotty (editor@tdagroup.com) is editor of IBM Data Management magazine.

Jeffrey Berger is a DB2 mainframe performance analyst at IBM, specializing in I/O and storage.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=766687
ArticleTitle=Does REORG Matter?
publish-date=10212011

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers