Planning checks

The IBM® Health Checker for z/OS® is a component of MVS™ that provides the framework for checking z/OS system and sysplex configuration parameters and the system environment to help determine places where an installation is deviating from suggested settings or where there might be configuration problems. IBM provides a set of check routines in IBM Health Checker for z/OS, but vendors, consultants, and system programmers can add other check routines.

The objective of a check is to identify potential problems before they impact your availability or, in worst cases, cause outages. The output of a check is messages and reports that help an installation analyze the health of a system.

You can use checks to look for things like:
  • Changes in configuration values that occur dynamically over the life of an IPL. Checks that look for changes in these values should run periodically to keep the installation aware of changes accruing since the last IPL, to help ensure a cleaner IPL the next time.
  • Threshold levels approaching the upper limits, especially those that might occur gradually or insidiously.
  • Single points of failure in a configuration.
  • Unhealthy combinations of configurations or values that an installation might not think to check.
  • Monitoring checks that create reports of collected data.
A check routine does the following:
  • Defines the severity of exceptions it finds and suggests a fix for the exception.
  • Defines a timer interval for the check.
  • May have default values overridden by installation updates.
  • Communicates check results by issuing messages to a buffer associated with the check.

The following are examples of situations customers uncovered running IBM Health Checker for z/OS at different times:

Hints for planning your checks:
  • Keep in mind that each check should only check for one thing. This will make it much easier for the installation to resolve exceptions that the check finds and override defaults.
  • If you are writing a check that will flag a default or common valid configuration setting as an exception, you should:
    • Make sure that the HZSADDCHECK exit routine for your check specifies the INACTIVE parameter on the HZSADDCK macro. INACTIVE specifies that the check should not run until the installation changes the state to active. See Writing an HZSADDCHECK exit routine and HZSADDCK macro — HZS add a check.
    • Include information in your check output messages about why the check user is getting an exception message for a default or common valid setting.

Look for great information on writing checks in our Redpaper™: There's lots of great experience-basecd information on writing checks in Redpaper Exploiting the Health Checker for z/OS infrastructure (REDP-4590-00).

Sample checks: You will find sample checks in the SYS1.SAMPLIB dataset and in the z/OS UNIX file system directory /usr/lpp/bcp/samples.