Features of the z/OS operating system

Read this topic for information about the approach of z/OS to support availability of systems and applications.

z/OS has a reliability philosophy that recognizes the inevitability of errors. This philosophy dictates a comprehensive approach to error isolation, identification, and recovery rather than a simplistic automatic restart approach. In support of this comprehensive approach, z/OS provides a vast array of software reliability and availability features, far beyond those features currently provided by any other operating system. A large portion of the z/OS kernel exists solely to provide advanced reliability, availability, and serviceability capabilities. For example, here are some RAS guidelines that must be obeyed:
  • All code must be covered by a recovery routine, including the code of recovery routines themselves. Therefore, multiple layers of recovery are supported.
  • All control areas and queues must be verified before processing continues.
  • Recovery and retry must be attempted if there is hope of success.
  • All failures that cannot be transparently recovered must be isolated to the smallest possible unit, for example, the current request, a single task, or a single address space.
Diagnostic data must be provided. Its objective is to allow the problem to be identified and fixed after a single occurrence. The diagnostic data is provided even when retry is attempted and succeeds.