Troubleshooting Drives in Degraded Status



This document describes troubleshooting internal drives that are in a degraded status.


Poor disk response.


Write cache not being used.



Resolving The Problem

Problem: Users are complaining that for the past couple of days the batch and interactive workload have been running very sluggish. The overall CPU usage on the system is relatively low and there are no issues with high page faults with memory. However, the WRKDSKSTS command shows that the drives are extremely busy (over 40%).

Solution: The problem description is indicating that the busy drives are the culprit that is causing the system slow down. To further troubleshoot, do the following:
1. Run the WRKDSKSTS command, then press the F11 function key to determine the Status of the drives. If the drives are in DEGRADED status, this probably means that the IOA cache battery has failed. The cache card works in synch with your disk drives.

The cache battery keeps the card working in the event of a power failure. The cache acts as a pail for data before the data is written to the disk drives to provide a more optimal system performance. If the cache battery fails, the data is written directly to disk, which causes the system performance to be slower. This is because the data is written directly to the disks for all of the operations performed on the system.

Note: For type Data Parity (DPY) drives, the DEGRADED status is seen via the WRKDSKSTS, F11 command. However, if the drives are mirrored (MRR), they will always show an ACTIVE status. Therefore, you must examine the PAL logs via SST for any SRC codes indicating a problem.
2. Examine the PAL (Product Activity Log) for System Reference Codes (SRC) via the System Service Tools (SST) for indications of a problem with the cache battery. Normally, SRCs that indicates cache battery problems end with XXXX8008 or XXXX8009, where 'XXXX' is the IOA card type.

To examine the PAL logs, do the following:
a Type the STRSST command and sign on the System Service Tools (SST).
b From the SST menu, select Option 1 - Start a service tool.
c Select Option 1 - Product activity log.
d Select Option 1 - Analyze log.
e Select Option 1 - All logs.
f Specify a From Date that is prior to when the system slow down started.
g Look for any system reference codes that end with 8008 or 8009 (for example, 27638008).
3. After the system reference code is located, contact your Hardware Service Provider for assistance in getting a CE dispatched who will be able to assist in replacing or repairing the failed hardware.

A video is available for reference.

Modified date:
03 November 2021