Soft storage errors

The soft storage errors are system recovery (SR) errors with the storage error corrected flag set in the MCIC to indicate that the storage controller was able to repair the error.

When a storage error corrected (SC) condition occurs, along with storage degradation (DS), the system attempts to stop using the affected frame. This action eliminates performance degradation that would result from hardware correction of later occurrences of the same error. It also minimizes the chance that the same problem will later occur as a storage error uncorrected.

If the frame contains pageable data, the system moves that data to another frame, and the original frame is marked offline. If the data in the frame cannot be moved, the frame is marked pending offline, and is subsequently taken offline if the frame is released or if its contents are made pageable. Note that, before the system takes a frame offline, it tests the frame; if it has no errors, the frame is returned to available status.

The threshold for SR machine checks affects the ability of the system to deal with storage error corrected conditions. The default threshold is 50 SR machine checks. The operator can change the SR threshold with the MODE operator command. When the threshold is reached, the system disables SR machine checks. This action prevents a subsequent storage error corrected from being presented. The system then does not take any action to remove the affected frame.