Are deleted rows wasting resources on your IBM i system?
A task for database engineers on IBM DB2 for i
Do you have an IBM DB2 table with a lot of deleted rows in it? Though this seems to be a basic question, the interesting thing is that many IBM i shops don't know the answer. Typically, most IBM i shops don't have a database administrator to watch these types of details. The worst I have seen during one of my customer engagements is a table with 2.6 billion (yes, that is billion) deleted rows. And that table was being scanned frequently. Not only are many customers not sure how many deleted rows there are in their database, but they also don't understand the impact those deleted rows are having.
Let's see if we can help you answer two questions:
- What tables have a large number of deleted rows?
- Are those deleted rows wasting processor and memory resources?
Of course, the deleted rows are taking up disk space, but how much? We also want to understand if those deleted rows are causing extra disk operations, taking up room in memory, and wasting processor utilization on scan operations. It might also be good to know if the storage consumed by the deleted rows is being reused when new rows are added to a table in the future.
First, let's find the DB2 tables and physical files that have the largest number of deleted rows in them. Before you start this analysis, you need to perform the following data collection for one or more of your main production data libraries. Create a library for this collection data with the following command:
Run the following CL commands for each library (replace
LIBNAME with your data library name or
These commands collect information about the files in the named
DSPFD FILE(LIBNAME/*ALL) TYPE(*MBR) OUTPUT(*OUTFILE) FILEATR(*PF) OUTFILE(DELETDROWS/DSPFD_MBR) OUTMBR(*FIRST *ADD) DSPFD FILE(LIBNAME/*ALL) TYPE(*ATR) OUTPUT(*OUTFILE) FILEATR(*PF) OUTFILE(DELETDROWS/DSPFD_ATR) OUTMBR(*FIRST *ADD)
Create a couple of indexes over these two newly created out files to improve the performance of your analysis.
CREATE INDEX DELETDROWS.DSPFD_ATR_IX ON DELETDROWS.DSPFD_ATR(PHFILE); CREATE INDEX DELETDROWS.DSPFD_MBR_IX ON DELETDROWS.DSPFD_MBR(MBFTYP, MBFILE, MBNDTR);
Run the following procedure call from any SQL interface. This stored procedure dumps SQL plan cache entries to a table named PCSS in the DELETDROWS library.
CALL QSYS2.DUMP_PLAN_CACHE('DELETDROWS', 'PCSS')
Step 1 – Identify tables that have a large number of deleted rows
The following SQL query returns the top 25 tables with the largest number of deleted rows. If you see tables in the query output that you know are used heavily, consider reorganizing them to eliminate the deleted rows.
SELECT DISTINCT F.MBFILE AS FILENAME, F.MBNAME as Member, MBLIB AS LIBRARY, MBNRCD AS "Number Non-Deleted Rows", MBNDTR AS "Number Deleted Rows", PHRUSE AS "Reusing Deleted Rows", integer(mbndtr/(Case when mbnrcd+mbndtr=0 then 1 else mbnrcd+mbndtr end) * 100) as "Percent Deleted", Case when MBDSSZ >0 then MBDSSZ ELSE MBDSZ2 END as Size, Integer(Case when MBDSSZ >0 then MBDSSZ ELSE MBDSZ2 END * (mbndtr/(Case when mbnrcd+mbndtr=0 then 1 else mbnrcd+mbndtr end))) as "Deleted Space" FROM DELETDROWS.DSPFD_MBR AS F JOIN DELETDROWS.DSPFD_ATR AS A ON F.MBFILE = A.PHFILE WHERE MBFTYP = 'P' AND MBNDTR > 10000 ORDER BY MBNDTR DESC FETCH FIRST 25 ROWS ONLY
You can modify this query to look for tables that exceed some threshold of
deleted rows. In this example, the threshold is set to 10,000
(mbndtr > 10000). You can also sort the results by the
Percent Deleted value to list tables with the highest
percentage of deleted rows at the top.
Figure 1 – Example output from step 1 query
Step 2 – Look for SQL statements that are wasting resources
Some SQL statements consume more system resources to process deleted rows based on access methods used in the runtime implementation. This section focuses on a couple of the methods that are more expensive because of the deleted rows in the table.
SQL statements performing table scans
Table scan operations are clearly more expensive because of the deleted rows. Each deleted row is accessed as DB2 scans the table to find the non-deleted rows that meet the search criteria. Even if more advanced methods are used to skip the rows that do not meet the search criteria, it is likely that there are more disk I/O operations required. More operations are needed because in general, the pages in the table are more sparsely populated with non-deleted rows.
The following SQL query identifies the tables that were the target of a table scan and accounts for the largest total of deleted rows processed.
with bigdel as ( SELECT distinct mbfile as File, mbname as Member, mblib as Library, mbnrcd as NumberRecords, mbndtr as NumberDeletedRecords, phruse as ReuseDeleted FROM deletdrows.DSPFD_MBR F join deletdrows.DSPFD_ATR a on f.MBFILE=a.PHFILE WHERE mbftyp ='P' AND mbndtr >10000 ) -- Join table scan info from the plan cache SELECT QQTLN as LIBrary, QQTFN as "Table Name", D.REUSEDELETED as "Reusing Deleted Rows" , Max(QQTOTR )as "Number Non-Deleted Rows", max(d.numberDeletedRecords) as "Number Deleted Rows" , SUM(d.numberDeletedRecords) as "Total Deleted Rows Scanned", count(*) as TotalScans FROM deletdrows.PCSS M JOIN bigdel d ON d.mblib=m.qqtln AND d.mbfile=m.qqtfn AND d.Member=m.qqtmn WHERE qqrid = 3000 AND QQC11 <>'Y' GROUP BY qqtln, qqtfn, d.reusedeleted ORDERY BY SUM(d.numberDeletedRecords) DESC FETCH FIRST 25 ROWS ONLY OPTIMIZE FOR ALL ROWS
It is outside the scope of this article, but you should look for the SQL statements that are causing these table scan operations and see if they creating additional indexes can eliminate the table scan method from being used. Still, that does not remove the need to reduce the size of these very frequently used DB2 tables in your database. As you can see in Figure 1, each table scan of DELTEST processes 210 MB of storage occupied by deleted rows.
Figure 2 – Example output from Step 2 query
SQL statements performing temporary index builds
Index builds have to perform more I/O operations to bring in storage pages with deleted rows only to skip over the deleted rows during the creation of the index. Some of these index builds might occur over small tables, but creating a temporary index many times can amplify the expense of those deleted rows.
The following SQL query identifies the SQL statements that involve temporary index builds over tables having a large number of deleted rows (greater than 10000).
with bigdel as ( Select mbfile, mblib,F.MBNAME as Member, mbftyp, MBNRCD as NumberRecords, MBNDTR as numberDeletedRecords, PHRUSE as REUSEDELETED, MBDSSZ as MBRSIZE FROM deletdrows.DSPFD_MBR F join deletdrows.DSPFD_ATR a ON f.MBFILE=a.PHFILE WHERE MBFTYP ='P' AND mbndtr >10000) -- Join Index builds to the tables with large number of deleted rows Select QQTLN as Library, QQTFN as "Table Name", D.REUSEDELETED as "Reusing Delweted Rows", Max(QQTOTR )as "Non-Deleted Rows", max(d.numberDeletedRecords) as "Deleted Rows", SUM(d.numberDeletedRecords) as "Total Deleted Rows Scanned", SUM(M.QQRIDX) as "Total index entries created", QQIDXD as Index_Advised_Columns, Sum(case when QQC16='N' then qqi6 else 0 end) as total_keys_built, Sum(case when QQC16='N' then 1 else 0 end ) as indexCreated, Sum(case when QQC16='Y' then 1 else 0 end ) as indexreused, Count(*) as TotalIXsCreated FROM deletdrows.PCSS M JOIN bigdel d ON d.mblib=m.qqtln and d.mbfile=m.qqtfn and d.Member= m.qqtmn WHERE qqrid = 3002 GROUP BY qqtln, qqtfn ,qqtmn, d.REUSEDELETED, m.QQRCOD , QQIDXD ORDER BY SUM(d.numberDeletedRecords)DESC FETCH FIRST 25 ROWS ONLY OPTIMIZE FOR ALL ROWS
The output from this query helps to identify those tables that are good candidates to be reorganized. An additional benefit of this report is that this can also be a good way to see some indexes that might need to be created to eliminate the temporary index builds.
Step 3 – Look at highly accessed tables
If you are running IBM i 7.1 or later versions on your system, you can use the new access counters to identify tables that are frequently accessed. These counters are automatically incremented by DB2 for both SQL and non-SQL interfaces. To identify the more highly accessed DB2 tables, run the following query and look for tables that have a high count of deleted rows.
SELECT table_schema, Table_name, Data_size, Number_Deleted_Rows, Logical_Reads, Physical_reads, Sequential_reads, Random_reads FROM qsys2.systablestat ORDER BY Logical_reads DESC FETCH FIRST 25 rows ONLY
In this example, the query output in Figure 3 shows only one table, DELTEST, that has a significant number of deleted rows.
Figure 3 – Example result from a highly accessed table query
Step 4 – Fix the problematic tables
To eliminate the deleted rows from the tables identified by the previous analysis methods, we need to use the Reorganize Physical File Member (RGZPFM) system command. However, there are some important points to consider before performing a reorganize operation. Some applications might fail to work properly if one or more of the tables they use are reorganized. There are two reasons that this might occur:
- The application has a dependency on the physical location of some rows. You would have to obtain the Relative Record Number (RRN) of a specific row and store the value else where so that you can use the value to access the row later. You might need it for some native record-level access request or the SQL RRN function. This case is unlikely, but it is a possibility. However, this is not a recommended programming practice.
- The application might have a dependency on the rows being stored in the arrival sequence. That is, a row inserted after another row will have a higher RRN value.
The other key consideration is the file attribute that controls whether deleted record space is reused for new rows being inserted into the table. If a file is reusing deleted rows, the applications cannot have the arrival sequence dependency unless rows are never deleted.
There are two basic options for reorganizing. The first option is to slide all the rows up filling the deleted holes and then truncate the set of deleted rows at the end. Performing the reorganize in this manner maintains the arrival sequence, and therefore, applications with an arrival order sequence dependency continue to function.
The second option is to move the non-deleted rows from the end of table to the deleted row locations at the beginning of the table. This method breaks the arrival order sequence of the rows, but usually performs much better than the first method. For more details on the RGZPFM command, refer to the IBM Redpaper™ listed in the Resources section.
Monitoring and managing deleted row space is one of the key activities that a DB2 for i database engineer should be performing on a regular basis. If you do not have a DB2 for i database engineer, you can read the blog entry to better understand the role and responsibilities of this position.
Hopefully, this article might have given you some motivation to start paying attention to deleted row space and allow you to take focused action to minimize the system resources being wasted on deleted rows.
You can also use the Phase 2 System Limits support to track changes in your biggest files, including delete operations. Refer to the Gain Big Insights into DB2 for i with System Limits, Phase 2 article for more information about System Limits support.
- IBM i Reorganize Physical File Member Redpaper
- DB2 for i Center of Excellence Services
- DB2 for i blog
- DB2 for i SQL Performance Workshop