What is a check?

A check is actually a program or routine that identifies potential problems before they impact your availability or, in worst cases, cause outages. A check is owned, delivered, and supported by the component, element, or product that writes it. Checks are separate from the IBM Health Checker for z/OS framework. A check might analyze a configuration in the following ways:

Changes in settings or configuration values that occur dynamically over the life of an IPL. Checks that look for changes in these values should run periodically to keep the installation aware of changes.
Threshold levels approaching the upper limits, especially those that might occur gradually or insidiously.
Single points of failure in a configuration.
Unhealthy combinations of configurations or values that an installation might not think to check.

This document discusses the following IBM® Health Checker for z/OS® concepts:

Check owner and check name: Each check has a check owner and check name.

The check owner is the owning element or component. For IBM checks, checks, these will all start with IBM. For example, IBMASM and IBMUSS are two IBM check owners.
The check name is the name of the check itself, such as ASM_NUMBER_LOCAL_DATASETS.

Check values: Each check includes a set of pre-defined values, such as:

Interval, or how often the check will run
Severity of the check, which influences how check output is issued
Routing and descriptor codes for the check

You can update or override some check values using either SDSF or statements in the HZSPRMxx parmlib member or the MODIFY command. These are called installation updates. You might do this if some check values are not suitable for your environment or configuration.

Check output: A check issues its output as messages, which you can view using SDSF, the HZSPRINT utility, or a log stream that collects a history of check output. If a check finds a potential problem, it issues a WTO message. We will call these messages exceptions. The check exception messages are issued both as WTOs and also to the message buffer. The WTO version contains only the message text, while the exception message in the message buffer includes both the text and explanation of the potential problem found, including the severity, as well as information on what to do to fix the potential problem.

Resolving check exceptions: To get the best results from IBM Health Checker for z/OS, you should let it run continuously on your system so that you will know when your system has changed. When you get an exception, you should resolve it using the information in the check exception message or overriding check values, so that you do not receive the same exceptions over and over.

Managing checks: You can use either SDSF, the HZSPRMxx parmlib member, or the IBM Health Checker for z/OS MODIFY (F hzsproc) command to manage checks. Managing checks includes:

Printing check output from either SDSF, or using the HZSPRINT utility - see Working with check output.
Displaying check information
Taking one time actions against checks, such as:
- Activating or deactivating checks
- Add new checks
- Refresh checks - Refresh processing first deletes a check from the IBM Health Checker for z/OS and then adds it back to the system.
- Run checks
See Cheat sheet: examples of MODIFY hzsproc commands.
Updating check values temporarily using SDSF or the MODIFY hzsproc command. See Making dynamic, temporary changes to checks.
Updating check values permanently using HZSPRMxx. See Making persistent changes to checks.