Mirrored protection recovery actions

On a system with mirrored protection, errors and failures have different effects. When a failure occurs on a system with mirrored protection, the recovery procedure is affected by the level of protection that is configured.

In considering aspects of recovery, you need to distinguish between errors and failures in the disk subsystem.

A disk error refers to an unexpected event during an input/output (I/O) operation which can cause the loss or corruption of the data that is being transferred. Most disk errors are caused by a failure in some part of the component chain from the I/O processor to the disk surface. Environmental effects such as power abnormalities or severe electrostatic discharges can also cause disk errors. Included in the definition of disk errors is a failure of the Licensed Internal Code that controls the disk subsystem.

When the system detects an error, generally the occurrence is logged and the operation is attempted again. Temporary errors are those from which the system can recover and complete the I/O operation successfully. When the error is so severe that the I/O operation cannot succeed, it is a permanent error.

When the system detects a permanent error, it classifies it as a failure in that hardware subsystem. In an ASP that does not have mirrored protection, a failure causes the system to become unusable. The system displays an error message which contains a System Reference Code (SRC) of A6xx 0244, A6xx 0255, or A6xx 0266 where xx is incremented every minute. During this time, the system tries the operation again in which the failure occurred. If the condition that caused the failure can be corrected (for example, by powering on a disk unit or replacing an electronic component), then normal system operations are resumed.