Operating by Exceptions Done Efficiently
This articles introduces the Status Display Facility (SDF), a control stand solution that comes as part of System Automation for z/OS. It explains how it copes with various different types of status information coming from potentially many mainframe servers in the enterprise and presented to the operator in a very efficient and effective way.
Whether your mainframe data center is small or large, whether you provide services to external clients or to your own organization, the objective to operate such an environment efficiently is the same. An operator at the control stand, overseeing multiple z System servers with tens, if not hundreds of systems running on these servers must be able to understand the status of the physical and logical components in the entire environment at any time. Unusual status changes, for example a non-recoverable termination of a critical workload, must be reported to the operator, immediately. At the same time, the operator is not only interested in the fact that something unusual happened, but also where the incident happened, how critical it is and what components are affected by it. As a consequence of these requirements and since no installation is like the other, the control stand and how status is reported must be flexible to meet every organizations' needs.
What is SDF?
SDF, the Status Display Facility provided as part of System Automation for z/OS is a solution that addresses above needs. SDF aggregates, sorts and presents status in a very fast, reliable and robust way. Being closely interwoven with the automation, monitoring the status of the servers and of the workloads running on them comes already out-of-the-box. And with only a few simple steps, SDF can be turned into a control stand that meets the specific needs of your mainframe data center.
How does it work?
Basically, SDF manages the status of all operational entities that matter to you in a hierarchical order and allows you to visualize it. This is based on status descriptors, status components in a status tree and panels which are explained next:
Every operational entity has a status. Examples of such entities are physical and virtual servers, started tasks, application groups, messages, network connections to other systems and many more. The status of these entities is captured in form of status descriptors either by System Automation or by custom routines. A status descriptor holds information about who has reported the status, the time of a status change, the status' priority and how it should be presented to the operator.
The status descriptor is attached to status components which are hierarchically structured in a status tree. A system, for example, is represented as a status component that includes status components for started tasks, groups and messages. Being part of a sysplex, the system's status component in turn is modeled as a subordinate status component of a sysplex. When a status descriptor is attached to any lower-level status component, its status is also propagated up to higher-level status components and up to the root of the status tree. This has the effect that, for example, a critical message is immediately visible on the system and on the sysplex level, if the status tree is defined accordingly.
Status components and any status descriptor attached to it can be displayed using panels. Default panels are provided to visualize the default status components that are shipped out-of-the-box with System Automation. But here, the power of SDF starts because you can modify the status tree to meet your needs or even add your own status trees. With only a few definitions, you can create panels that report the status of your complete mainframe environment. Some customers created panels that show the status of their sites side by side. Other customers created panels that aggregate critical message status across systems in a sysplex. There are many possibilities to customize SDF such that it is suited best for your specific operational requirements.
Panels can be chained together to allow operators to navigate from a sysplex overview directly to the system of interest. On every status component, the operator can directly see the details of the most critical status descriptor associated with it and she can flip through all the descriptors that are associated with it. Furthermore, SDF allows you to define custom actions in form of programmable function keys to ease the operator's job in specific situation.
Any status update is reflected on the display at once. So, when operators work with SDF, they can immediately distinguish important status updates from less important ones and see what system or sysplex is affected by the status update. From there, operators can zoom into the details and react by using either commands or pre-defined functions invoked from the panel with a single key.
To unleash the strengths of SDF, I can only encourage you not to hesitate getting familiar with the concepts illustrated above and start looking into modifying the status tree and panels. The following small example shows you that this is not difficult:
Being interested in the health of the JES2-environment, I decided to monitor the JES2 SPOOL utilization overall and that of the fifteen highest utilizing jobs in a system. In addition, the number of free job numbers, job output elements and job queue elements as well as free BERTs (block extension reuse tables) should be monitored. These operational entities become status components.
To visualize these status components, the status tree needs to be modified such that JES and its details are part of the system's status tree, which looks like this:
A small REXX-script is used to periodically ask JES2 about the SPOOL-utilization, the largest users and the elements of interest. It creates the status descriptors accordingly and updates the status components defined for it.
To display the detailed JES2 status, a panel needs to be added to SDF that specifies where the status components are displayed and what further interaction should be possible. The example demonstrates how to purge a job's output directly from the panel.
The status tree and panel definitions and also the REXX-script used by this example are available for download from the System Automation web site. When implemented, you see a panel like this:
This short article could only scratch the surface of what is possible with using SDF. Feedback from customers is very positive and past attempts to motivate them to move to other operational interfaces were not much successful. This demonstrates that despite being a 3270-user interface, SDF is valued for its reliability, the immediate responsiveness and last but not least for the fact that it is so flexible to meet almost all requirements that an installation might have.
For more information, please visit the System Automation wiki at https://ibm.biz/BdXupt.