Recovery from hardware problems

Recovery is the attempt by the hardware, operating system, operator, automation, or any combination of these, to correct system malfunctions and return the system to a state in which it can do productive work. Recovery from some hardware errors is automatic; that is, the hardware recovers without any actions from the operating system or intervention by the operator or automation. Recovery from other hardware errors requires overt actions from the operating system, operator, and/or automation. For example, to keep the system in operation, the operator or the system can configure offline a failing unit, such as a storage element, a processor, or a channel path. The system continues processing, possibly with some degradation.

The process of recovery includes the following: