IBM Support

Damage Detection Tool

News


Abstract

This document details the Damage Detection Tool.

Content

Damage Detection Tool

Errors on a disk drive can occur and can cause an object to be unusable. This causes the object to be marked as 'damaged' by the IBM i operating system. In fact, there are several ways in which an object can get into this state. While each of them is rare, the IBM i operating system is built to detect that damage so that your application, or the system, does not use them. If the damaged object were to be used "as is" unreliable results would happen. It's one of the ways IBM i protects the integrity of your data, and of the system. But, often, you don't have a chance to notice an object is in this state until you are trying to use it. Because of that, IBM has built a tool which can help you find some of those damaged objects ahead of time, so you can deal with them proactively."

The new Damage Detection tool is based on the Application Runtime Expert (ARE) framework. The ARE tool is designed to help you better understand what has changed on your systems and the environments that your applications run in. This tool has 2 components, the core runtime engine which is a part of the base operating system. The Core provides the necessary framework and runtime engine to allow you to run a verification on any system and return the results in a easy to review report. The 2nd part or ARE is the actual product itself 5733-ARE. The ARE product is the GUI interface that allows you to build custom collections of attributes or system content that you want to keep track of. It also contains a runtime interface where you can run in a real time manner or as a scheduled comparison that can be run on a regular basis.

The time it take to actually run the verification on a system is dependent on the size of the disk pool being scanned and the number of resources that are allocated to the running of this tool. In testing the tool took 3 hours and 18 minutes for a complete full disk scan on a large system (7TB of internal storage with approximately 5% used). On a smaller system (352GB of internal storage with approximately 15% used) it took 4 minutes for a complete scan. These times can vary greatly and in no way should you expect the same results.

This tool only identifies objects that have data checks embedded within them. It may not find all damage. Objects found by the tool also may not be damaged due to an embedded data check. If objects are found to be damaged use proper damage recovery for the specific object. This generally requires a delete and restore from a good save. For physical files you can use the following URL to use CPYF to create a new file: http://www-01.ibm.com/support/docview.wss?uid=nas8N1010721
When should you run the tool:

When should you run the tool:
This tool can be used if users are concerned that objects may have been damaged due to disk drive failures, disk controller issues, write failures, logical failures, data partially over written or system crashes.
What does the tool do:

What does the tool do:
The damage detection tool has been built to find objects with in the systems disks that contain data checks. The tool does not search though system objects, but rather actually reads every sector on the specified disks looking for a data section that contains a data check. A data check is when the actual data write fails for some reason. There are several reasons that a write to disk fails. Once a disk sector has a data check in it, that will cause the object that sector belongs to be reported as damaged the next time that area is accessed.

The damage detection tool can be used to check every disk on the entire system, a specific iASP, a specific disk on the system or even a region within a disk. The tool will scan every sector on the disk looking for malformed data. Once a data check is encountered, that disk sector is reported by the damage detection tool. The tool will then attempted to determine what actual ILE object that sector is associated with. This is often a difficult process and is not always successful depending on where the actual damage is within that object.

When the damage detection tools finds that data check it will report both the actual disk sector and the ILE object that it represents. The tool does the checking in a multi threaded manner. There are controls that give you the ability to control how fast or slow the scan is done. When a data check is encountered it is reported to the ARE infrastructure and the associated ILE object is resolved and recorded in the report.
What is needed to run the tool:

What is needed to run the tool:

§ IBM i 7.2
– 5770SS1 option 3 – Extended Base Directory Support
– 5770SS1 option 30 – QShell
– 5770SS1 option 33 – PASE
– 5761JV1 option 8 or 11 – J2SE (5 or 6) 32 bit
– SI54169
– SI54517


§ IBM i 7.1
– 5770SS1 option 3 – Extended Base Directory Support
– 5770SS1 option 30 – QShell
– 5770SS1 option 33 – PASE
– 5761JV1 option 8 or 11 – J2SE (5 or 6) 32 bit
SI50374 (5770SS1)
– Make sure the following dist-req are applied:
§ Req: MF56898
§ Req: MF56876
§ Req: SI45469

§ IBM i 6.1
– 5760SS1 option 3 – Extended Base Directory Support
– 5760SS1 option 30 – QShell
– 5760SS1 option 33 – PASE
– 5761JV1 option 8 or 11 – J2SE (5 or 6) 32 bit
SI45499 (5761SS1)
– Make sure the following dist-req are applied:
§ SI51025 (5761SS1)
§ SI30796 (5761SS1)
§ R610:
§ MF57435 (5761999)
§ MF57436 (5761999)
§ R611:
§ MF57425 (5761999)
§ MF57426 (5761999)
How to run the tool :

How to run the tool :
In order to run this tool you must be signed on with a user profile that is either QSECOFR or has *ALLOBJ special authority. This tool is a script that is run in the Qshell environment. The process for the tool runs in two separate phases. The first phase is a dump of the directory. This is the process that will build the list of disk segments that will then be scanned in the second phase. Both of these phases can be controlled with parameters. You can choose to skip the director dump you can specify the number of processes. The second phase is run at the lowest levels, and you can specify the number of threads needed to scan all the disks in a manner that will ensure minimal impact to the system.

To run the tool issue the following from a command line
qsh and press enter

Some examples of how to call the tool:
§ Check all storage units (sys base) (Qshell command):
/QIBM/ProdData/OS/OSGi/templates/bin/areVerify.sh -storage diskUnits=*ALL
§ Check certain storage units (Qshell command):
/QIBM/ProdData/OS/OSGi/templates/bin/areVerify.sh -storage diskUnits=1,2,4
§ Check specified segment
/QIBM/ProdData/OS/OSGi/templates/bin/areVerify.sh -storage checkSegment=0x1122334455660000

All Qshell scripts can also be run directly from the command line by submitting a job from the CL command line
§ Check all disk units (sys base), in background (CL command):
SBMJOB CMD(STRQSH CMD('/QIBM/ProdData/OS/OSGi/templates/bin/areVerify.sh -storage confirm=true diskUnits=*ALL'))

There are a number of additional parameters that control additional functions and features.

– /QIBM/ProdData/OS/OSGi/templates/bin/areVerify.sh -storage [,...]

Mandatory parameters – One of the following parameters must be specified:

§ diskUnits=<comma separated numbers>
– Description: Specifies which disk units to check.
– Default value: N/A.
– Example: diskUnits=1,2,4 diskUnits=*ALL

§ checkSegment=<hexValue>
– Description: Check a specific segment by its address
– Default value: N/A.
– Example: checkSegment=0x1122334455660000

Optional parameters for controlling the Directory Dump phase – only specify if you want to override the default behavior.

§ skipDirDump=<bool>
– Description: if true is specified, the directory dump phase will be skipped. If you have run the tool before and there’s no changes in storage you care about, use this parameter to skip the directory dump phase, which takes long time.
– Default value: false
– Example: skipDirDump=true

§ dbName=<temp DB name for dirdump data>
– Description: Specifies the temporary library name to store the directory dump data. This parameter is only for directory dump phase.
– Default value: QTMPAREDDD
– Example: dbName=MYLIB

§ dirType=[P|I|T]
– Description: Identifies the directories for which data should be collected. This parameter is only for directory dump phase. The possible values for this parameter are:
• T: Temporary
• P: Sysbase permanent & user ASPs
• I: Independent ASP
– Default value: P
– Example: dirType=I

§ iASP=<number>
– Description: identifies the IASP number if ‘I’ was selected for the “Directory identifier” parameter. If a value other than ‘I’ was selected for the “Directory identifier” parameter this parameter is ignored. This parameter is only for directory dump phase.
– Default value: 0
– Example: iASP=3

§ jobQueue=<name>
– Description: identifies the name of the job queue which will be used for the background jobs which collect the requested data. This parameter is only for directory dump phase.
– Default value: QCTL

§ jobQueueLib=<name>
– Description: identifies the library of the job queue which will be used for the background jobs which collect the requested data. This parameter is only for directory dump phase.
– Default value: QSYS

§ jobCount=<number>
– Description: identifies the number of background jobs which will be used to collect the requested data. This value must be a number between 5 and 100. This parameter is only for directory dump phase.
– Default value: 30

Optional parameters for the disk scan phase – only needed if you want to over ride the defaults.

§ skipPageVerification=<bool>
– Description: specifies whether the Page Verification phase will be performed.
– Default value: true

§ threadCount=<number>
– Description: Specifies the thread count for Page Verification. This value must be a number between 1 and 100.

Optional general usage parameters – only needed if you want to override the defaults
§ op=[check | clear]
– Description: Operation mode, used only when parameter diskUnits is specified.
check: check segment error in specified disk units.
clear: clear error flags in free space of the specified disk units. When this option is specified, only the first disk unit will be processed, and other disk units are ignored.
– Default value: check

§ statusUpdateInterval=<number>: minutes of status update
– Description: Specifies the status update interval, in minutes. The status message will be written to console, and/or log file. This value must be a number between 1 and 1440.
– Default value: 10

§ outputFile=<IFS file name>
– Description: Specifies the log file
– Default value: /tmp/areDodReport.txt

§ confirm=<boolean>
– Description: Controls whether user confirmation is required before starting the task. If true, no user confirmation prompt is shown.
– Default value: false

§ version
– Description: Show version of the tool
Known issues:

Known issues:
§ If any JDBC driver is not found or fail to be created, update JDBC by these PTFs:
– IBM i 7.1: JDBC Driver: SI50669 (5770SS1)
– IBM i 6.1: JDBC Driver: SI49252 (5761SS1)

§ CCSID setting
Job CCSID must NOT be 65535.
The CCSID for the user profile used on the job that contains the JVM must NOT be 65535. Use the CHGUSRPRF command to update the user profile or, if the profile is configured for *SYSVAL, consider updating the QCCSID system value. It is also recommended that the user profile used on the JDBC connection be configured with the same CCSID. Refer to the IBM® iSeries™ Information Center topic Language identifiers and associated default CCSIDs for further information on selecting the appropriate CCSID.
Potential impacts to the system:

Potential impacts to the system:
§ The tool has the potential to cause heavy IO when running

§ The tool (Directory Dump phase) creates a temporary library
– Size is less than 5/1000 of total storage used
– The temp library is not deleted automatically.
• By default the library is QTMPAREDDD

§ System impacts can be managed with the following parms:
– jobCount: [5, 100], for Directory Dump phase
– threadCount: [1, 100], for Page Verification phase
– skipDirDump: set to true to skip the directory dump phase, if:
• Previously directory dump has been performed, and
• No useful storage update since the last dump, and
• The previously generated temp library (by default QTMPAREDDD) is not deleted/changed.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Component":"DB2 for IBM i","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"Version Independent","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
04 November 2021

UID

nas8N1019814